Created attachment 243221 [details] analyze_suspend output on 4.4.0-23-generic On resume from S3, acpi_ps_parse_aml consumes over 600 ms during the resume_machine phase, and thenconsumes over 100 ms during resume_noirq phase. analyze_suspend output suggests this is due to EC operations.
Created attachment 243231 [details] acpidump output bios_date:09/13/2016 bios_vendor:Dell Inc. bios_version:1.0.7 board_name:0839Y6 board_serial:/86S1CG2/CN129636AM0170/ board_vendor:Dell Inc. board_version:A00 chassis_serial:86S1CG2 chassis_type:9 chassis_vendor:Dell Inc. product_name:XPS 13 9360
Created attachment 243291 [details] analyze_suspend output on 4.9-rc3 reproduced with 4.9-rc3 ACPI seems measurably faster in this kernel, but is still 238 ms + 33 ms
Not sure if it is tunable. Please try it again with acpi.trace_state=enable and upload the dmesg output for the suspend/resume process. Thanks and best regards
I'm checking the acpidump. In acpi_pm_finish, _SST and _WAK will be invoked. While in this case, it's only about _WAK as there is no _SST in the acpidump. Let me paste _WAK here. Method (_WAK, 1, NotSerialized) // _WAK: Wake { RWAK (Arg0) \_SB.PCI0.NWAK (Arg0) \_SB.PCI0.LPCB.SWAK (Arg0) WAK (Arg0) Return (WAKP) /* \WAKP */ } I can confirm that NWAK/SWAK/WAK are all trivial. OperationRegion (GNVS, SystemMemory, 0x7AC00000, 0x0698) Field (GNVS, AnyAcc, Lock, Preserve) { Offset (0x697), AES3, 8 } Name (WAKP, Package (0x02) { Zero, Zero }) Scope (_SB) { Device (PCI0) { OperationRegion (HBUS, PCI_Config, Zero, 0x0100) Field (HBUS, DWordAcc, NoLock, Preserve) { PMLK, 1, , 3, PM0H, 2, Offset (0x81), PM1L, 2, , 2, PM1H, 2, Offset (0x82), PM2L, 2, , 2, PM2H, 2, Offset (0x83), PM3L, 2, , 2, PM3H, 2, Offset (0x84), PM4L, 2, , 2, PM4H, 2, Offset (0x85), PM5L, 2, , 2, PM5H, 2, Offset (0x86), PM6L, 2, , 2, PM6H, 2, } Scope (\_SB.PCI0) { Name (PALK, Zero) Name (PA0H, Zero) Name (PA1H, Zero) Name (PA1L, Zero) Name (PA2H, Zero) Name (PA2L, Zero) Name (PA3H, Zero) Name (PA3L, Zero) Name (PA4H, Zero) Name (PA4L, Zero) Name (PA5H, Zero) Name (PA5L, Zero) Name (PA6H, Zero) Name (PA6L, Zero) Method (NWAK, 1, NotSerialized) { PM0H = PA0H /* \_SB_.PCI0.PA0H */ PM1H = PA1H /* \_SB_.PCI0.PA1H */ PM1L = PA1L /* \_SB_.PCI0.PA1L */ PM2H = PA2H /* \_SB_.PCI0.PA2H */ PM2L = PA2L /* \_SB_.PCI0.PA2L */ PM3H = PA3H /* \_SB_.PCI0.PA3H */ PM3L = PA3L /* \_SB_.PCI0.PA3L */ PM4H = PA4H /* \_SB_.PCI0.PA4H */ PM4L = PA4L /* \_SB_.PCI0.PA4L */ PM5H = PA5H /* \_SB_.PCI0.PA5H */ PM5L = PA5L /* \_SB_.PCI0.PA5L */ PM6H = PA6H /* \_SB_.PCI0.PA6H */ PM6L = PA6L /* \_SB_.PCI0.PA6L */ PMLK = PALK /* \_SB_.PCI0.PALK */ } } Device (LPCB) { Method (SWAK, 1, NotSerialized) { SLPE = Zero AES3 = Zero If (RTCS) {} Else { Notify (PBTN, 0x02) // Device Wake } } OperationRegion (SMIE, SystemIO, PMBA, 0x04) Field (SMIE, ByteAcc, NoLock, Preserve) { , 10, RTCS, 1, } OperationRegion (SLPR, SystemIO, SMCR, 0x08) Field (SLPR, ByteAcc, NoLock, Preserve) { , 4, SLPE, 1, } } } } Method (WAK, 1, NotSerialized) { } But RWAK is very big and thus it's very difficult to figure out what's going wrong by reviewing the ASL. Method (RWAK, 1, Serialized) { D8XH (One, 0xAB) ADBG ("_WAK") \_SB.PCI0.GEXP.INVC () If (S0ID == One) { \_SB.SCGE = One } If (NEXP) { If (OSCC & 0x02) { \_SB.PCI0.NHPG () } If (OSCC & 0x04) { \_SB.PCI0.NPME () } } If (Arg0 == 0x03) { SSMP = 0x0E } If (Arg0 == 0x03) {} If ((Arg0 == 0x03) || (Arg0 == 0x04)) { If (GBSX & 0x40) { \_SB.PCI0.GFX0.IUEH (0x06) } If (GBSX & 0x80) { \_SB.PCI0.GFX0.IUEH (0x07) } If (CondRefOf (\_PR.DTSE)) { If (\_PR.DTSE && (TCNT > One)) { TRAP (0x02, 0x14) } } OperationRegion (NVID, SystemMemory, 0xE0100000, 0x02) Field (NVID, ByteAcc, NoLock, Preserve) { VVID, 16 } If (OSYS >= 0x07D9) { If (VVID == 0x10DE) { If (NHDA == One) { \_SB.PCI0.PEG0.PEGP.MLTF = One } Else { \_SB.PCI0.PEG0.PEGP.MLTF = Zero } } } If (TBTS == One) { Acquire (OSUM, 0xFFFF) \_GPE.TINI () Release (OSUM) } If (\_SB.PCI0.RP01.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP01, Zero) // Bus Check } If (\_SB.PCI0.RP02.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP02, Zero) // Bus Check } If (\_SB.PCI0.RP03.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP03, Zero) // Bus Check } If (\_SB.PCI0.RP04.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP04, Zero) // Bus Check } If (\_SB.PCI0.RP05.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP05, Zero) // Bus Check } If (\_SB.PCI0.RP06.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP06, Zero) // Bus Check } If (\_SB.PCI0.RP07.VDID != 0xFFFFFFFF) { If (DSTS == Zero) { Notify (\_SB.PCI0.RP07, Zero) // Bus Check } } If (\_SB.PCI0.RP08.VDID != 0xFFFFFFFF) { If (DSTS == Zero) { Notify (\_SB.PCI0.RP08, Zero) // Bus Check } } If (\_SB.PCI0.RP09.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP09, Zero) // Bus Check } If (\_SB.PCI0.RP10.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP10, Zero) // Bus Check } If (\_SB.PCI0.RP11.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP11, Zero) // Bus Check } If (\_SB.PCI0.RP12.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP12, Zero) // Bus Check } If (\_SB.PCI0.RP13.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP13, Zero) // Bus Check } If (\_SB.PCI0.RP14.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP14, Zero) // Bus Check } If (\_SB.PCI0.RP15.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP15, Zero) // Bus Check } If (\_SB.PCI0.RP16.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP16, Zero) // Bus Check } If (\_SB.PCI0.RP17.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP17, Zero) // Bus Check } If (\_SB.PCI0.RP18.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP18, Zero) // Bus Check } If (\_SB.PCI0.RP19.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP19, Zero) // Bus Check } If (\_SB.PCI0.RP20.VDID != 0xFFFFFFFF) { Notify (\_SB.PCI0.RP20, Zero) // Bus Check } If (CondRefOf (\_SB.VMEN)) { \_SB.VMEN = 0xFF } } If ((Arg0 == 0x03) || (Arg0 == 0x04)) { EV2 (Arg0, Zero) If (ECG4 ()) { GENS (0x1C, One, Zero) } Else { GENS (0x1C, Zero, Zero) } } If (TBTS == One) { Signal (WFEV) } Return (Package (0x02) { Zero, Zero }) } To Len: Can we debug this by specifying acpi.trace_state=opcode-once acpi.trace_method=RWAK to narrow down the issue? So I probably need you to upload another suspend/resume dmesg with the following kernel parameter enabled: "acpi.trace_state=opcode-once acpi.trace_method=RWAK". Thanks
Created attachment 246941 [details] ftrace log for the test I obtained the ftrace log from Rui. Where ec_guard() is invoking schedule_timeout().
I was thinking the issue should have been fixed by this commit: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=df45db6177f8dde380d44149cca46ad800a00575 Which is in the upstream since v4.9-rc1.
In v4.9-rc3 test, acpi_pm_finish() consumption is down from 629.6ms to 219.4ms. But I'm not sure if this is related to the commit. We also need the v4.9-rc3 trace log to confirm. Thanks Lv
Created attachment 247141 [details] ftrace log for the 4.9-rc3 test Obtained the ftrace log from Len. Let me upload it.
I collected all ec_guard() invoked in acpi_pm_finish() here (line no prefixed): (447920): 999.836159 | 0) analyze-3422 | | acpi_pm_finish() { (277198): 999.646784 | 0) analyze-3422 | 2.435 us | } /* ec_guard */ (277270): 999.646827 | 0) analyze-3422 | 2.258 us | } /* ec_guard */ ... (1641938): 1000.100870 | 3) analyze-3422 | 2.195 us | } /* ec_guard */ (1642010): 1000.100896 | 3) analyze-3422 | 2.101 us | } /* ec_guard */ (1642078): 1000.100919 | 3) analyze-3422 | 1.220 us | } /* ec_guard */ (1789036): 1000.154475 | 3) analyze-3422 | 318316.6 us | } /* acpi_pm_finish */ They do not seem to be the culprit of the overall consumption (1.9us * 3058 < 318316.6 us). And we can clearly see delay invoked in ec_guard(): 999.646848 | 0) analyze-3422 | | ec_guard() { 999.646848 | 0) analyze-3422 | 0.032 us | __usecs_to_jiffies(); 999.646848 | 0) analyze-3422 | | ec_transaction_completed() { 999.646848 | 0) analyze-3422 | 0.038 us | _raw_spin_lock_irqsave(); 999.646849 | 0) analyze-3422 | 0.037 us | _raw_spin_unlock_irqrestore(); 999.646849 | 0) analyze-3422 | 0.694 us | } /* ec_transaction_completed */ 999.646849 | 0) analyze-3422 | 0.031 us | jiffies_to_usecs(); 999.646849 | 0) analyze-3422 | | __udelay() { 999.646850 | 0) analyze-3422 | 0.049 us | delay_tsc(); 999.646850 | 0) analyze-3422 | 0.349 us | } /* __udelay */ 999.646850 | 0) analyze-3422 | 2.265 us | } /* ec_guard */ So it seems the fix is working. I'm checking the culprit of the consumption. Thanks Lv
The main culprits are all related to AML execution: 999.836159 | 0) analyze-3422 | | acpi_pm_finish() { 999.837734 | 0) analyze-3422 | 1074.341 us | } /* acpi_ps_parse_loop */ 999.848387 | 1) analyze-3422 | 11900.16 us | } /* acpi_ps_parse_aml */ 999.848393 | 1) analyze-3422 | 11949.10 us | } /* acpi_ps_execute_method */ 999.848398 | 1) analyze-3422 | 12071.25 us | } /* acpi_ns_evaluate */ 999.848417 | 1) analyze-3422 | 12098.45 us | } /* acpi_evaluate_object */ 999.848417 | 1) analyze-3422 | 12098.94 us | } /* acpi_execute_simple_method */ 999.848417 | 1) analyze-3422 | 12159.44 us | } /* acpi_device_sleep_wake */ 999.848418 | 1) analyze-3422 | 12161.69 us | } /* acpi_disable_wakeup_device_power */ 999.857071 | 1) analyze-3422 | 8501.036 us | } /* acpi_ps_parse_aml */ 999.857075 | 1) analyze-3422 | 8532.092 us | } /* acpi_ps_execute_method */ 999.857078 | 1) analyze-3422 | 8614.325 us | } /* acpi_ns_evaluate */ 999.857092 | 1) analyze-3422 | 8633.444 us | } /* acpi_evaluate_object */ 999.857092 | 1) analyze-3422 | 8633.790 us | } /* acpi_execute_simple_method */ 999.857092 | 1) analyze-3422 | 8669.712 us | } /* acpi_device_sleep_wake */ 999.857093 | 1) analyze-3422 | 8671.104 us | } /* acpi_disable_wakeup_device_power */ 999.857093 | 1) analyze-3422 | 20848.90 us | } /* acpi_disable_wakeup_devices */ 999.861985 | 1) analyze-3422 | 3235.483 us | } /* acpi_ps_parse_loop */ 999.863760 | 1) analyze-3422 | 1015.387 us | } /* acpi_ds_exec_end_op */ 999.865399 | 1) analyze-3422 | 3221.256 us | } /* acpi_ps_parse_loop */ 999.867350 | 1) analyze-3422 | 1016.358 us | } /* acpi_ds_exec_end_op */ 999.868993 | 1) analyze-3422 | 3226.160 us | } /* acpi_ps_parse_loop */ 999.872214 | 1) analyze-3422 | 3039.087 us | } /* acpi_ps_parse_loop */ 999.875595 | 1) analyze-3422 | 3034.346 us | } /* acpi_ps_parse_loop */ 999.878788 | 1) analyze-3422 | 3011.461 us | } /* acpi_ps_parse_loop */ 999.882171 | 1) analyze-3422 | 3036.285 us | } /* acpi_ps_parse_loop */ 999.885391 | 1) analyze-3422 | 3038.595 us | } /* acpi_ps_parse_loop */ 999.888774 | 1) analyze-3422 | 3036.353 us | } /* acpi_ps_parse_loop */ 999.891991 | 1) analyze-3422 | 3035.571 us | } /* acpi_ps_parse_loop */ 999.895372 | 1) analyze-3422 | 3034.344 us | } /* acpi_ps_parse_loop */ 999.898565 | 1) analyze-3422 | 3011.190 us | } /* acpi_ps_parse_loop */ 999.901947 | 1) analyze-3422 | 3035.867 us | } /* acpi_ps_parse_loop */ 999.905168 | 1) analyze-3422 | 3038.766 us | } /* acpi_ps_parse_loop */ 999.908549 | 1) analyze-3422 | 3035.766 us | } /* acpi_ps_parse_loop */ 999.911768 | 1) analyze-3422 | 3036.638 us | } /* acpi_ps_parse_loop */ 999.915124 | 1) analyze-3422 | 3010.673 us | } /* acpi_ps_parse_loop */ 999.918341 | 1) analyze-3422 | 3009.048 us | } /* acpi_ps_parse_loop */ 999.921722 | 1) analyze-3422 | 3035.197 us | } /* acpi_ps_parse_loop */ 999.924939 | 1) analyze-3422 | 3034.928 us | } /* acpi_ps_parse_loop */ 999.928320 | 1) analyze-3422 | 3034.535 us | } /* acpi_ps_parse_loop */ 999.931539 | 1) analyze-3422 | 3037.450 us | } /* acpi_ps_parse_loop */ 999.934895 | 1) analyze-3422 | 3010.396 us | } /* acpi_ps_parse_loop */ 999.938112 | 1) analyze-3422 | 3034.909 us | } /* acpi_ps_parse_loop */ 999.941492 | 1) analyze-3422 | 3033.605 us | } /* acpi_ps_parse_loop */ 999.944709 | 1) analyze-3422 | 3035.493 us | } /* acpi_ps_parse_loop */ 999.948091 | 1) analyze-3422 | 3036.141 us | } /* acpi_ps_parse_loop */ 999.951309 | 1) analyze-3422 | 3035.394 us | } /* acpi_ps_parse_loop */ 999.954665 | 1) analyze-3422 | 3010.442 us | } /* acpi_ps_parse_loop */ 999.957883 | 1) analyze-3422 | 3036.092 us | } /* acpi_ps_parse_loop */ 999.961265 | 1) analyze-3422 | 3035.414 us | } /* acpi_ps_parse_loop */ 999.964484 | 1) analyze-3422 | 3037.209 us | } /* acpi_ps_parse_loop */ 999.966778 | 1) analyze-3422 | 2019.625 us | } /* acpi_ps_parse_loop */ 999.990106 | 1) analyze-3422 | 3863.816 us | } /* acpi_ps_parse_loop */ 1000.041456 | 1) analyze-3422 | 50133.20 us | } /* acpi_ec_transaction */ 1000.041579 | 1) analyze-3422 | 50256.60 us | } /* acpi_ec_space_handler */ 1000.041583 | 1) analyze-3422 | 50271.27 us | } /* acpi_ev_address_space_dispatch */ 1000.041583 | 1) analyze-3422 | 50271.81 us | } /* acpi_ex_access_region */ 1000.041583 | 1) analyze-3422 | 50272.08 us | } /* acpi_ex_field_datum_io */ 1000.041583 | 1) analyze-3422 | 50272.37 us | } /* acpi_ex_extract_from_field */ 1000.041585 | 1) analyze-3422 | 50280.49 us | } /* acpi_ex_read_data_from_field */ 1000.041585 | 1) analyze-3422 | 50281.40 us | } /* acpi_ex_resolve_node_to_value */ 1000.041586 | 1) analyze-3422 | 50281.67 us | } /* acpi_ex_resolve_to_value */ 1000.041588 | 1) analyze-3422 | 50291.19 us | } /* acpi_ds_evaluate_name_path */ 1000.041594 | 1) analyze-3422 | 50298.21 us | } /* acpi_ds_exec_end_op */ 1000.045603 | 1) analyze-3422 | 55121.15 us | } /* acpi_ps_parse_loop */ 1000.053695 | 1) analyze-3422 | 5358.237 us | } /* acpi_ps_parse_loop */ 1000.059797 | 1) analyze-3422 | 4298.178 us | } /* schedule */ 1000.059797 | 1) analyze-3422 | 4298.516 us | } /* schedule_timeout */ 1000.059798 | 1) analyze-3422 | 4299.125 us | } /* __down_timeout */ 1000.059798 | 1) analyze-3422 | 4299.987 us | } /* down_timeout */ 1000.059798 | 1) analyze-3422 | 4300.288 us | } /* acpi_os_wait_semaphore */ 1000.059869 | 1) analyze-3422 | 4375.908 us | } /* acpi_ex_system_wait_mutex */ 1000.059869 | 1) analyze-3422 | 4376.207 us | } /* acpi_ex_acquire_mutex_object */ 1000.059869 | 1) analyze-3422 | 4376.517 us | } /* acpi_ex_acquire_mutex */ 1000.059870 | 1) analyze-3422 | 4378.113 us | } /* acpi_ex_opcode_2A_0T_1R */ 1000.059886 | 1) analyze-3422 | 4418.602 us | } /* acpi_ds_exec_end_op */ 1000.064813 | 1) analyze-3422 | 9481.874 us | } /* acpi_ps_parse_loop */ 1000.068299 | 1) analyze-3422 | 2362.766 us | } /* acpi_ps_parse_loop */ 1000.071553 | 1) analyze-3422 | 2584.362 us | } /* acpi_ps_parse_loop */ 1000.086436 | 1) analyze-3422 | 1248.126 us | } /* acpi_ps_parse_loop */ 1000.102214 | 3) analyze-3422 | 2544.556 us | } /* acpi_ps_parse_loop */ 1000.105174 | 3) analyze-3422 | 1184.069 us | } /* acpi_ps_parse_loop */ 1000.107823 | 1) analyze-3422 | 1474.114 us | } /* acpi_ps_parse_loop */ 1000.125900 | 3) analyze-3422 | 1249.508 us | } /* acpi_ps_parse_loop */ 1000.128162 | 3) analyze-3422 | 1176.825 us | } /* acpi_ps_parse_loop */ 1000.129385 | 3) analyze-3422 | 272016.6 us | } /* acpi_ps_parse_aml */ 1000.129389 | 3) analyze-3422 | 272040.5 us | } /* acpi_ps_execute_method */ 1000.129403 | 3) analyze-3422 | 272138.6 us | } /* acpi_ns_evaluate */ 1000.129554 | 3) analyze-3422 | 272293.5 us | } /* acpi_evaluate_object */ 1000.129554 | 3) analyze-3422 | 272293.8 us | } /* acpi_hw_execute_sleep_method */ 1000.129594 | 3) analyze-3422 | 272500.0 us | } /* acpi_hw_legacy_wake */ 1000.129594 | 3) analyze-3422 | 272500.4 us | } /* acpi_hw_sleep_dispatch */ 1000.129594 | 3) analyze-3422 | 272500.7 us | } /* acpi_leave_sleep_state */ 1000.130598 | 3) analyze-3422 | 1000.265 us | } /* acpi_ns_evaluate */ 1000.130610 | 3) analyze-3422 | 1013.341 us | } /* acpi_evaluate_object */ 1000.130610 | 3) analyze-3422 | 1013.624 us | } /* acpi_evaluate_integer */ 1000.130615 | 3) analyze-3422 | 1019.243 us | } /* acpi_power_get_state */ 1000.134373 | 3) analyze-3422 | 1033.147 us | } /* acpi_ns_evaluate */ 1000.134384 | 3) analyze-3422 | 1045.668 us | } /* acpi_evaluate_object */ 1000.134385 | 3) analyze-3422 | 1045.947 us | } /* acpi_evaluate_integer */ 1000.134390 | 3) analyze-3422 | 1051.525 us | } /* acpi_power_get_state */ 1000.154475 | 3) analyze-3422 | 24880.61 us | } /* acpi_resume_power_resources */ 1000.154475 | 3) analyze-3422 | 318316.6 us | } /* acpi_pm_finish */ So I'm going to re-assign this bug to BIOS category. To Len: Since this is related to the AML execution, please follow the comment 3 to obtain the AML trace logs. Thanks in advance. Best regards Lv
It looks _WAK consumes the majority of the time, and waiting in a loop. Maybe it is better to use the following command line to obtain the AML trace log: acpi.trace_state=opcode-once acpi.trace_method=_WAK Thanks
confirmed that things are unchanged with 4.9 kernel (4.9.0-994-generic #201610192310 SMP Thu Oct 20 03:12:13 UTC 2016) but this binary PPA kernel doesn't build with CONFIG_ACPI_DEBUG, so will measure home built kernel to get ACPI debug info.
(In reply to Len Brown from comment #12) > confirmed that things are unchanged with 4.9 kernel > > (4.9.0-994-generic #201610192310 SMP Thu Oct 20 03:12:13 UTC 2016) > > but this binary PPA kernel doesn't build with CONFIG_ACPI_DEBUG, > so will measure home built kernel to get ACPI debug info. Sounds like my mistake, the command line should be: acpi.trace_state=opcode-once acpi.trace_method_name=_WAK Not acpi.trace_state=opcode-once acpi.trace_method=_WAK Sorry for the mistake. Thanks Lv
Just a note, that firmware 1.2.3, released on December 14th, 2016, is available [1]. Unfortunately, you cannot use UEFI Capsule to directly update it from a non Microsoft Windows machine [2]. I haven’t tested anything else yet, but can confirm that resume is not instant but takes quite a while compared to for example Google Chromebooks. Debian Stretch/testing with Linux 4.8.11 is used on the machine here, so please tell me, if I can help you in any way. (Though in my opinion, that would be Dell’s or Canonical’s job.) [1] https://downloads.dell.com/FOLDER04065439M/1/XPS_9360_1.2.3.exe [2] https://twitter.com/pmenzel_molgen/status/811528720259416064
(In reply to Paul Menzel from comment #14) > Debian Stretch/testing with Linux 4.8.11 is used on the machine here, so > please tell me, if I can help you in any way. We are putting effort to root cause any such kind of delays. So if you have same notebook, could you help to do the test: 1. Enable CONFIG_ACPI_DEBUGGER_USER=m and rebuild the kernel; 2. Boot the kernel with "acpi.trace_state=opcode acpi.trace_method_name=_WAK"; 3. Perform suspend/resume (ex., using scripts/analyze-suspend.py -m mem); 4. Upload the dmesg output here. Thanks in advance.
With Linux 4.10-rc2+ (0f64df301240 Merge branch 'parisc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux) the system freezes on resume, so I have to build another Linux kernel.
(In reply to Paul Menzel from comment #16) > With Linux 4.10-rc2+ (0f64df301240 Merge branch 'parisc-4.10-2' of > git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux) the > system freezes on resume, so I have to build another Linux kernel. OK. Thanks for the help! Best regards Lv
Created attachment 250351 [details] Output of `journalctl -k` Please find the Linux kernel messages attached after running `scripts/analyze-suspend.py -m mem` and waking up by pressing the power button.
Created attachment 250371 [details] `journalctl -b _TRANSPORT=kernel -o short-precise` Here is the output with timestamps with full microsecond precision.
Using journalctl with small dmesg buffer length seems can cause log entries lost when the output is frequent. According to the log, there is no significant problems in AML execution. Just too many AML opcode executed. I'll check the log to find some clues. If I cannot, maybe I need you to do the test again with buffer size enlarged dmesg output. Will be back soon. Thanks Lv
1. I don't know why there are 2 _WAK invoked. Did you try the test twice? 2. I couldn't see significant suspend log indicator in the output. And the earliest _WAK method execution tracde logs are also missing. 2.1. 1st _WAK Jan 05 11:36:56.324315 xps13 kernel: IPv6: ADDRCONF(NETDEV_UP): wlp58s0: link is not ready Jan 05 11:37:09.473973 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062e71:ByteConst] execution. 2.2. 2nd _WAK Jan 05 11:36:56.324315 xps13 kernel: IPv6: ADDRCONF(NETDEV_UP): wlp58s0: link is not ready Jan 05 11:37:09.473973 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062e71:ByteConst] execution. During the suspend/resume process, syslogd is not working (frozen by the kernel), so if the kernel dmesg buffer (which is implemented as circular buffer) is not drained, the earliest logs will disappear due to the small log buffer size. I'd suggest you to do the test again with enlarged log buffer length (log_buf_len=1M or even larger to capture the entire _WAK execution process). 4. Examining the rest of the log entries (from bottom up). Method (_WAK, 1, NotSerialized) // _WAK: Wake { RWAK (Arg0) \_SB.PCI0.NWAK (Arg0) \_SB.PCI0.LPCB.SWAK (Arg0) WAK (Arg0) Return (WAKP) /* \WAKP */ } It looks the RWAK consumes the majority of the time: 4.1: RWAK - RWAK bein is missing, but the first line can be used as it is a part of the RWAK execution: Jan 05 11:37:09.473973 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062e71:ByteConst] execution. Jan 05 11:37:09.535415 xps13 kernel: extrace-0175 ex_trace_point : Method End [0xffffbaba000516a8:\RWAK] execution. 4.2. NWAK Jan 05 11:37:09.535482 xps13 kernel: extrace-0175 ex_trace_point : Method Begin [0xffffbaba00043c52:\_SB.PCI0.NWAK] execution. Jan 05 11:37:09.536326 xps13 kernel: extrace-0175 ex_trace_point : Method End [0xffffbaba00043c52:\_SB.PCI0.NWAK] execution. 4.3. SWAK Jan 05 11:37:09.536384 xps13 kernel: extrace-0175 ex_trace_point : Method Begin [0xffffbaba00043db8:\_SB.PCI0.LPCB.SWAK] execution. Jan 05 11:37:09.536876 xps13 kernel: extrace-0175 ex_trace_point : Method End [0xffffbaba00043db8:\_SB.PCI0.LPCB.SWAK] execution. 4.4. WAK Jan 05 11:37:09.536940 xps13 kernel: extrace-0175 ex_trace_point : Method Begin [0xffffbaba00064792:\WAK] execution. Jan 05 11:37:09.536960 xps13 kernel: extrace-0175 ex_trace_point : Method End [0xffffbaba00064792:\WAK] execution. I'll get back here after doing more investigation in RWAK. Thanks Lv
I can see time has been consumed in ECR1. And this looks really a BIOS issue: If (Arg0 == Zero) { Local0 = EC00 /* \_SB_.PCI0.LPCB.ECDV.EC00 */ } If (Arg0 == One) { Local0 = EC01 /* \_SB_.PCI0.LPCB.ECDV.EC01 */ } If (Arg0 == 0x02) { Local0 = EC02 /* \_SB_.PCI0.LPCB.ECDV.EC02 */ } If (Arg0 == 0x03) { Local0 = EC03 /* \_SB_.PCI0.LPCB.ECDV.EC03 */ } If (Arg0 == 0x04) { Local0 = EC04 /* \_SB_.PCI0.LPCB.ECDV.EC04 */ } If (Arg0 == 0x05) { Local0 = EC05 /* \_SB_.PCI0.LPCB.ECDV.EC05 */ } If (Arg0 == 0x06) { Local0 = EC06 /* \_SB_.PCI0.LPCB.ECDV.EC06 */ } If (Arg0 == 0x07) { Local0 = EC07 /* \_SB_.PCI0.LPCB.ECDV.EC07 */ } If (Arg0 == 0x08) { Local0 = EC08 /* \_SB_.PCI0.LPCB.ECDV.EC08 */ } If (Arg0 == 0x09) { Local0 = EC09 /* \_SB_.PCI0.LPCB.ECDV.EC09 */ } If (Arg0 == 0x0A) { Local0 = EC10 /* \_SB_.PCI0.LPCB.ECDV.EC10 */ } If (Arg0 == 0x0B) { Local0 = EC11 /* \_SB_.PCI0.LPCB.ECDV.EC11 */ } If (Arg0 == 0x0C) { Local0 = EC12 /* \_SB_.PCI0.LPCB.ECDV.EC12 */ } If (Arg0 == 0x0D) { Local0 = EC13 /* \_SB_.PCI0.LPCB.ECDV.EC13 */ } If (Arg0 == 0x0E) { Local0 = EC14 /* \_SB_.PCI0.LPCB.ECDV.EC14 */ } If (Arg0 == 0x0F) { Local0 = EC15 /* \_SB_.PCI0.LPCB.ECDV.EC15 */ } If (Arg0 == 0x10) { Local0 = EC16 /* \_SB_.PCI0.LPCB.ECDV.EC16 */ } If (Arg0 == 0x11) { Local0 = EC17 /* \_SB_.PCI0.LPCB.ECDV.EC17 */ } If (Arg0 == 0x12) { Local0 = EC18 /* \_SB_.PCI0.LPCB.ECDV.EC18 */ } If (Arg0 == 0x13) { Local0 = EC19 /* \_SB_.PCI0.LPCB.ECDV.EC19 */ } If (Arg0 == 0x14) { Local0 = EC20 /* \_SB_.PCI0.LPCB.ECDV.EC20 */ } If (Arg0 == 0x15) { Local0 = EC21 /* \_SB_.PCI0.LPCB.ECDV.EC21 */ } If (Arg0 == 0x16) { Local0 = EC22 /* \_SB_.PCI0.LPCB.ECDV.EC22 */ } If (Arg0 == 0x17) { Local0 = EC23 /* \_SB_.PCI0.LPCB.ECDV.EC23 */ } If (Arg0 == 0x18) { Local0 = EC24 /* \_SB_.PCI0.LPCB.ECDV.EC24 */ } If (Arg0 == 0x19) { Local0 = EC25 /* \_SB_.PCI0.LPCB.ECDV.EC25 */ } If (Arg0 == 0x1A) { Local0 = EC26 /* \_SB_.PCI0.LPCB.ECDV.EC26 */ } If (Arg0 == 0x1B) { Local0 = EC27 /* \_SB_.PCI0.LPCB.ECDV.EC27 */ } If (Arg0 == 0x1C) { Local0 = EC28 /* \_SB_.PCI0.LPCB.ECDV.EC28 */ } If (Arg0 == 0x1D) { Local0 = EC29 /* \_SB_.PCI0.LPCB.ECDV.EC29 */ } If (Arg0 == 0x1E) { Local0 = EC30 /* \_SB_.PCI0.LPCB.ECDV.EC30 */ } If (Arg0 == 0x1F) { Local0 = EC31 /* \_SB_.PCI0.LPCB.ECDV.EC31 */ } If (Arg0 == 0x20) { Local0 = EC32 /* \_SB_.PCI0.LPCB.ECDV.EC32 */ } If (Arg0 == 0x21) { Local0 = EC33 /* \_SB_.PCI0.LPCB.ECDV.EC33 */ } If (Arg0 == 0x22) { Local0 = EC34 /* \_SB_.PCI0.LPCB.ECDV.EC34 */ } If (Arg0 == 0x23) { Local0 = EC35 /* \_SB_.PCI0.LPCB.ECDV.EC35 */ } If (Arg0 == 0x24) { Local0 = EC36 /* \_SB_.PCI0.LPCB.ECDV.EC36 */ } If (Arg0 == 0x25) { Local0 = EC37 /* \_SB_.PCI0.LPCB.ECDV.EC37 */ } If (Arg0 == 0x26) { Local0 = EC38 /* \_SB_.PCI0.LPCB.ECDV.EC38 */ } If (Arg0 == 0x27) { Local0 = EC39 /* \_SB_.PCI0.LPCB.ECDV.EC39 */ } If (Arg0 == 0x28) { Local0 = EC40 /* \_SB_.PCI0.LPCB.ECDV.EC40 */ } If (Arg0 == 0x29) { Local0 = EC41 /* \_SB_.PCI0.LPCB.ECDV.EC41 */ } If (Arg0 == 0x2A) { Local0 = EC42 /* \_SB_.PCI0.LPCB.ECDV.EC42 */ } If (Arg0 == 0x2B) { Local0 = EC43 /* \_SB_.PCI0.LPCB.ECDV.EC43 */ } If (Arg0 == 0x2C) { Local0 = EC44 /* \_SB_.PCI0.LPCB.ECDV.EC44 */ } If (Arg0 == 0x2D) { Local0 = EC45 /* \_SB_.PCI0.LPCB.ECDV.EC45 */ } If (Arg0 == 0x2E) { Local0 = EC46 /* \_SB_.PCI0.LPCB.ECDV.EC46 */ } If (Arg0 == 0x2F) { Local0 = EC47 /* \_SB_.PCI0.LPCB.ECDV.EC47 */ } If (Arg0 == 0x30) { Local0 = EC48 /* \_SB_.PCI0.LPCB.ECDV.EC48 */ } If (Arg0 == 0x31) { Local0 = EC49 /* \_SB_.PCI0.LPCB.ECDV.EC49 */ } If (Arg0 == 0x32) { Local0 = EC50 /* \_SB_.PCI0.LPCB.ECDV.EC50 */ } If (Arg0 == 0x33) { Local0 = EC51 /* \_SB_.PCI0.LPCB.ECDV.EC51 */ } If (Arg0 == 0x34) { Local0 = EC52 /* \_SB_.PCI0.LPCB.ECDV.EC52 */ } If (Arg0 == 0x35) { Local0 = EC53 /* \_SB_.PCI0.LPCB.ECDV.EC53 */ } If (Arg0 == 0x36) { Local0 = EC54 /* \_SB_.PCI0.LPCB.ECDV.EC54 */ } If (Arg0 == 0x37) { Local0 = EC55 /* \_SB_.PCI0.LPCB.ECDV.EC55 */ } If (Arg0 == 0x38) { Local0 = EC56 /* \_SB_.PCI0.LPCB.ECDV.EC56 */ } See, the code should utilize "Else", but there is no "Else". So each EC opregion access causes huge time to execute bunch of If () blocks. This is the reason of the 280ms delay. Just dell ASL developers' silly mistake. I think the problem is root caused. And can be closed.
In the provided log, there are only 2 ECR1s invoked and consumed all of the time: 1. 1st ECR1, the begin is missing, so we can use the 1st line: Jan 05 11:37:09.473973 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062e71:ByteConst] execution. Jan 05 11:37:09.499250 xps13 kernel: extrace-0175 ex_trace_point : Method End [0xffffbaba00062c7c:\_SB.PCI0.LPCB.ECDV.ECR1] execution. 2. 2nd ECR1: Jan 05 11:37:09.523229 xps13 kernel: extrace-0175 ex_trace_point : Method Begin [0xffffbaba00062c7c:\_SB.PCI0.LPCB.ECDV.ECR1] execution. Jan 05 11:37:09.535394 xps13 kernel: extrace-0175 ex_trace_point : Method End [0xffffbaba00062c7c:\_SB.PCI0.LPCB.ECDV.ECR1] execution. Each If in ECR1 consumes about 30ms (because of printk), and the real EC accesses is not so significant: Line 472: Jan 05 11:37:09.523338 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062c7c:If] execution. Line 479: Jan 05 11:37:09.523644 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062c7c:If] execution. Line 490: Jan 05 11:37:09.523966 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062c9b:If] execution. Line 497: Jan 05 11:37:09.524207 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062c9b:If] execution. Line 503: Jan 05 11:37:09.524232 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062ca6:If] execution. Line 510: Jan 05 11:37:09.524496 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062ca6:If] execution. Line 516: Jan 05 11:37:09.524531 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062cb1:If] execution. Line 523: Jan 05 11:37:09.524741 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062cb1:If] execution. Line 529: Jan 05 11:37:09.524758 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062cbd:If] execution. Line 536: Jan 05 11:37:09.524906 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062cbd:If] execution. Line 542: Jan 05 11:37:09.524923 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062cc9:If] execution. Line 549: Jan 05 11:37:09.525072 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062cc9:If] execution. Line 555: Jan 05 11:37:09.525088 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062cd5:If] execution. Line 562: Jan 05 11:37:09.525216 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062cd5:If] execution. Line 568: Jan 05 11:37:09.525237 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062ce1:If] execution. Line 581: Jan 05 11:37:09.525451 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba000616b8:If] execution. Line 837: Jan 05 11:37:09.525471 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba000616c6:If] execution. Line 846: Jan 05 11:37:09.525616 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba000616c6:If] execution. Line 856: Jan 05 11:37:09.525776 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00051c23:If] execution. Line 857: Jan 05 11:37:09.525790 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00051c4c:If] execution. Line 864: Jan 05 11:37:09.525909 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062ce1:If] execution. Line 865: Jan 05 11:37:09.525932 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062ced:If] execution. Line 872: Jan 05 11:37:09.526047 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062ced:If] execution. Line 873: Jan 05 11:37:09.526063 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062cf9:If] execution. Line 880: Jan 05 11:37:09.526192 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062cf9:If] execution. Line 881: Jan 05 11:37:09.526222 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d05:If] execution. Line 888: Jan 05 11:37:09.526427 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d05:If] execution. Line 889: Jan 05 11:37:09.526461 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d11:If] execution. Line 896: Jan 05 11:37:09.526663 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d11:If] execution. Line 897: Jan 05 11:37:09.526687 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d1d:If] execution. Line 904: Jan 05 11:37:09.526836 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d1d:If] execution. Line 905: Jan 05 11:37:09.526850 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d29:If] execution. Line 912: Jan 05 11:37:09.526970 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d29:If] execution. Line 913: Jan 05 11:37:09.526982 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d35:If] execution. Line 920: Jan 05 11:37:09.527106 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d35:If] execution. Line 921: Jan 05 11:37:09.527129 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d41:If] execution. Line 928: Jan 05 11:37:09.527246 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d41:If] execution. Line 929: Jan 05 11:37:09.527261 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d4d:If] execution. Line 936: Jan 05 11:37:09.527416 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d4d:If] execution. Line 937: Jan 05 11:37:09.527431 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d59:If] execution. Line 944: Jan 05 11:37:09.527552 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d59:If] execution. Line 945: Jan 05 11:37:09.527572 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d65:If] execution. Line 952: Jan 05 11:37:09.527682 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d65:If] execution. Line 953: Jan 05 11:37:09.527703 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d71:If] execution. Line 960: Jan 05 11:37:09.527821 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d71:If] execution. Line 961: Jan 05 11:37:09.527835 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d7d:If] execution. Line 968: Jan 05 11:37:09.527951 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d7d:If] execution. Line 969: Jan 05 11:37:09.527965 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d89:If] execution. Line 976: Jan 05 11:37:09.528087 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d89:If] execution. Line 977: Jan 05 11:37:09.528109 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062d95:If] execution. Line 984: Jan 05 11:37:09.528230 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062d95:If] execution. Line 985: Jan 05 11:37:09.528254 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062da1:If] execution. Line 992: Jan 05 11:37:09.528447 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062da1:If] execution. Line 993: Jan 05 11:37:09.528462 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062dad:If] execution. Line 1000: Jan 05 11:37:09.528582 xps13 kernel: extrace-0175 ex_trace_point : Opcode End [0xffffbaba00062dad:If] execution. Line 1001: Jan 05 11:37:09.528604 xps13 kernel: extrace-0175 ex_trace_point : Opcode Begin [0xffffbaba00062db9:If] execution. The ASL developer simply downgrade the performance because of this. And since EC accesses are the major work in _WAK, this simply makes the time consumption 28-56 times worse than a normal _WAK. We can capture how many ECR1 executed for this bug. Please boot the kernel with: acpi.trace_state=method acpi.trace_method_name="_WAK" log_buf_len=2M Performing suspend/resume. Upload dmesg output here. You can also use journalctrl, but enlarged log_buf_len is required. And if 2M is not big enough, please help to choose a suitable value. Thanks Lv
Created attachment 250491 [details] Customized DSDT The customized DSDT can help to detect if the performance can be tuned by adding Else in ECR1/ECW1. To use it, putting this file in include/acpi folder, and configure CONFIG_ACPI_CUSTOM_DSDT_FILE=acpi/dsdt.aml, then rebuild the kernel. Thanks and best regards Lv
Created attachment 250501 [details] Method override This file can be used to override ECR1/ECW1 during runtime. If you don't want to use the customized DSDT, please check Documentation/acpi/method-customizing.txt to use this instead. Thanks and best regards Lv
Just a suggestion for those who want to load a DSDT without recompiling the kernel, you can use the Clover EFI bootloader: https://sourceforge.net/projects/cloverefiboot/ You can manually extract the EFI binaries from the CloverISO (macOS not required) and place them into the EFI partition at /EFI/CLOVER (can also copy /EFI/BOOT/BOOTX64.EFI if you don't want to edit NVRAM). For Linux, place the replacement ACPI files in /EFI/CLOVER/ACPI/Linux and CLOVER will load them before chaining into GRUB.
Closing as problem root caused.
I was asked by Jared from Dell to update this report [1], that this is still an issue with the latest firmware 1.3.2. [1] https://twitter.com/djdmngz/status/852166663986393088
Created attachment 255887 [details] analyze_suspend output on 4.11-rc6 w/ BIOS 1.3.2 I see the same result as Paul. Still 260 ms in acpi_pm_finish() with latest BIOS (1.3.2, 01/18/2017) and latest Linux kernel (4.11-rc6+) Note that this issue is the #3 offender for XPS13-9360 resume-time, following #1 ath10k #2 USB
Mario, has there been any progress on Dell’s side? The new firmware release 1.3.5 lists only the ME firmware update. > XPS 13 9360 1.3.5 BIOS > > ### Fixes & Enhancements ### > > Updated Intel ME Firmware to address security advisory CVE-2017-5689 / > INTEL-SA-00075 Here is reply from Dell regarding another delay, which I think is in the embedded controller firmware. > I have just received additional info from Engineering regarding this matter > > As it seems this issue, will not be investigated further due to the fact > that | 600ms , delay, is not visually visible, nor does it affect the > functionality and or operation Engineering has noted that they will not be > able to raise this further with ODM, or the BIOS TEAMS for deeper review > > Apologies for this inconvenience, but what I can also propose, if this > matter or the nature of this incident does become bigger I guess that shows, that Dell is not willing to compete with the quality of Google Chromebooks and -boxes or Apple. The lesson learned, stay away from Dell, and help free software to succeed in the firmware space too.
Created attachment 258217 [details] analyze suspend with 4.13, latest bios For me the issue is much worse, spending over 1 second in acpi_pm_finish. Could this be reinvestigated, the bug reopened, or shall I open a new bug?
(In reply to Esokrarkose from comment #31) > Created attachment 258217 [details] > analyze suspend with 4.13, latest bios > > For me the issue is much worse, spending over 1 second in acpi_pm_finish. > Could this be reinvestigated, the bug reopened, or shall I open a new bug? Thank you for the follow-up. Unfortunately, as written in my comment, it’s a *firmware bug*, so Linux cannot do anything about it. Your options are to hack the firmware yourself, or to contact the Dell support, and make that issue public. I urge you to at least open a call with them, so that they see, that customers care.
Contacting Dell about this can be problematic. I already try to contact them about problems with Precision 5510 ACPI: - http://en.community.dell.com/techcenter/os-applications/f/4457/t/20006889 - https://bugzilla.kernel.org/show_bug.cgi?id=109511 It takes me ~2 months to reach someone from Bios team and final reply was: "... BIOS team has confirmed that these errors do not impact device or OS functionality, that the presence of these errors in logs is due to the change in logging policy set in the linux kernel, that the acpi functions causing errors to be seen are part of the dell proprietary power management interface, and that kernel support of these calls is not needed." Funny as for me calling non existing methods and referring to non existing devices in DSDT is an error, but it seems for a fix I need to buy newer model which has almost all missing devices/namespaces present: - http://en.community.dell.com/techcenter/os-applications/f/4457/p/20006889/21004428#21004428
Yes, you need a long breath, but you definitely should contact them, complain directly to Dell and also publicly.
Ok reading http://en.community.dell.com/techcenter/os-applications/f/4457/p/20006889/21004428#21004428 I wont bother contacting them as I am sure they don't care if it takes 2 or 3 seconds to wake up from suspend :-(. I would only try if you could refer me to a resource where I do not have to waste hours just to reach someone who actually could make a difference.
(In reply to Esokrarkose from comment #35) > Ok reading > http://en.community.dell.com/techcenter/os-applications/f/4457/p/20006889/ > 21004428#21004428 I wont bother contacting them as I am sure they don't care > if it takes 2 or 3 seconds to wake up from suspend :-(. Sorry, but that is the worst decision. Please document publicly, that you expect fast boot-times. That’s the least you should do. > I would only try if you could refer me to a resource where I do not have to > waste hours just to reach someone who actually could make a difference. Yes, please contact the support with your service tag number. Also, please test the customized DSDT from comment #24.
Recently, Linux decided not to do busy polling for EC during suspend/resume noirq stages. Which can make things a bit worse. Actually the correct way of tuning this is not in Linux kernel, but in BIOS, BIOS should avoid too many EC transactions during suspend/resume by tuning suspend/resume AML control methods. So leave this report closed/documented.
Thanks very much for the responses. I will try to reach Dell, but the first try already got me angry: How am I supposed to email them without hanging forever on that "Dell System Detect" page that wants me to download an run some exe file despite having selected that I am running Linux!? It seems impossible to me to reach out to them per mail with my service tag without running that diagnosis exe. Either I am stupid or this is really impossible without phoning them. What if my hardware would be broken!?
(In reply to Esokrarkose from comment #38) > Thanks very much for the responses. I will try to reach Dell, but the first > try already got me angry: How am I supposed to email them without hanging > forever on that "Dell System Detect" page that wants me to download an run > some exe file despite having selected that I am running Linux!? It seems > impossible to me to reach out to them per mail with my service tag without > running that diagnosis exe. Either I am stupid or this is really impossible > without phoning them. What if my hardware would be broken!? This requirement before opening a request seems to be new. You can reach the Dell support over Twitter, and find out what to do.
Created attachment 258319 [details] analyze_suspend with Kernel 4.13, BIOS 2.2.1 I am uploading the analyze suspend output with more information to prove that acpi_ps_parse_aml takes over 1 second. I get the same result on kernel 4.9 (so I can't really reproduce the smaller footprint on that kernel), so the bug title is underrating the issue, please update the title to make sure that the guys from Dell do not underrate this issue. Lv Zheng: Should I still test with the customized DSDT?
> Lv Zheng: Should I still test with the customized DSDT? We are falling back old tuning state due to the following commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=66259146 It's not a good timing for us to work further on tuning this again as we have lost our base line. Thanks and best regards Lv
(In reply to Lv Zheng from comment #41) > > Lv Zheng: Should I still test with the customized DSDT? > > We are falling back old tuning state due to the following commit: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=66259146 > It's not a good timing for us to work further on tuning this again as we > have lost our base line. What do you mean by “lost our base line”? Additionally, instead of a revert, I would have preferred a Linux command line option to enable that behavior (disabled by default).
Paul: My interpretation of "lost our base line" is, as mentioned in the commit message, the fans blowing up in case of the Thinkpad X1 Carbon is considered a worse issue than the slow resume time. I do not understand that logic though: As mentioned in the commit message, a firmware update has been released to address the blowing up fans on that hardware whereas in case of the XPS 13 there has been no firmware update and the latter is officially supported for Linux by the vendor in contrast to the former.
By the way, with Linux 4.15.2 `acpi_pm_finish` now takes 1.1 seconds. :(
I subscribed Mario from Dell here, I hope he could have a look and drop a comment. @Paul: I can confirm the 1.1 seconds ... when patching the kernel with the patch I am going to upload after this comment I can reduce it to 300ms which is a bad value also.
Created attachment 274133 [details] patch to improve wake up time 3 times faster wake up.
Created attachment 274137 [details] 4.9 kernel patched benchmark I have used the 4.9 kernel for a long time and the patch has been very effective there as you can see (25ms for acpi_pm_finish).
Did some testing, patched kernel 4.9.80 also regressed a lot. The culprit is pti, booting with "nopti" I can get acpi_pm_finish down to 40ms on 4.15.3.
Created attachment 274139 [details] patched kernel 4.15 booted with nopti
Created attachment 274141 [details] patched kernel 4.15
@Esokrakose: You should see if this patch that was just applied to bleeding-edge for linux-pm helps: https://patchwork.kernel.org/patch/10207819/ https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=3cd091a773936c54344a519f7ee1379ccb620bee
Mario, From what I can see is that this patch requires the kernel command line option "acpi.ec_no_wakeup=1" and does the exact same thing as the patch I was testing (see the attachment from comment 46). However as I noted, since the introduction of pti the patch is not as effective anymore (pti causes a 8x slowdown in acpi_pm_finish), see my testing (effective is relative, it is still a 3x improvement). Would it be possible to disable pti before entering suspend and re-enable it on resume? Mario, as pointed out things should be tuned in the Bios, could you talk to the BIOS devs at Dell?
Esokrarkose: Right, I figured it's easy enough for you to add that on your kernel command line rather than carry an out of tree patch. Regarding pti: That would be best to bring up on the mailing lists not on a closed issue that has been root caused. Regarding tuning in the BIOS: Changing number of EC transactions happening over suspend/resume isn't a trivial change. EC transactions have a purpose and I don't believe I can make any case to anyone in BIOS team to make this type of architectural change in shipping systems.
Esokrarkose, could you create a new bug report for the PTI issue to document it? Thank you for your other patches. Mario, Rafael, the referenced commit [1] should be marked for stable. [1] https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=3cd091a773936c54344a519f7ee1379ccb620bee