Hello, for several weeks now my laptop started having issues during suspend. It usually happens for a few times after a new kernel is installed, after which it goes away on its own - only to return with a kernel update. I have managed to capture the pm-trace: dmesg | grep -i -A2 magic [ 4.634329] PM: Magic number: 14:984:176 [ 4.634412] acpi device:39: hash matches [ 4.634425] acpi device:0c: hash matches I bought this machine way back in 2011 and it was suspending and resuming successfully for years, even after the suspend/resume stopped working under Windows after I upgraded the GPU. What additional info I could provide in order to make this bug possible to solve? Thank you!
(In reply to Julian Sikorski from comment #0) > Hello, > > for several weeks now my laptop started having issues during suspend. what did you do that triggers this issue? do you have a know kernel version that suspend works all the time?
The issue seems to be triggered by kernel update: after updating the kernel the freezes happen a couple of times, after which the situation stabilizes and suspend works. I do not really have a version working all the time, it also would be hard to find as the issue does not happen every time but only a few times after rebooting into a new kernel. In the meantime I was able to test all modes listed in [1] and I was never able to trigger the lockup using any of the modes. In case it helps, the issue looks as follows: screen goes black, leds blink, then the wlan led turns on and off the final timeup and the laptop never suspends. Is there a way to somehow delay the suspend of screen, ethernet or some other way to produce a log of what is happening? what are acpi devices 39 and 0c? Does this hash match even make sense? [1] https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt
I am now thinking that kernel updates have nothing to do with this issue, it is rather triggered by reboot/cold boot. I just experienced the lockup by rebooting from linux into windows and back again.
In the meantime I was able to confirm that the issue is somewhat reliably triggerred by cold boot. I have now upgraded to Fedora 31 prerelease, its kernel-5.3.4-300.fc31.x86_64 is affected too. My gut feeling is that the problem was not present during Fedora 30 timeframe, which would put a regression after its release kernel-5.0.9-301.fc30.x86_64. I will try to test this theory and report back.
I have managed to trigger the issue a few times with kernel-5.1.18-300.fc30.x86_64. Then the resume started working with the same kernel. Out of curiosity: this laptop is now over 8 years old. Could hardware issues like weak RTC battery or dried out caps be responsible for such behaviour?
I have now tried every known reproducer on kernel-5.0.9-301.fc30.x86_64 and I was not able to trigger the freeze. I am afraid this is as good of a proof that the regression happened during 5.0 -> 5.1 as I will be able to provide, as due to intermittent nature of this problem I can only definitely confirm which kernels are affected but not which are not. I have also noticed that there is a difference in LED behaviour between freezing and not: If the laptop suspends successfully, the wifi led turns off, drive led turns on, drive led turns off and the laptop suspends: https://youtu.be/1CHzkR-wsTA During the freeze, after the drive led turns off, the wifi led goes back on and off again: https://youtu.be/Cz-ZS8ccSig Not sure how useful the LED info is, but I thought any information might be useful at this point.
I have upgraded to Fedora 31 (5.3.6-300.fc31.x86_64), it is also affected. In the meantime I have observed that running # echo core > /sys/power/pm_test # echo mem > /sys/power/state # echo none > /sys/power/pm_test does reliably prevent the lockups from happening.
I tried netconsole, but it appears that the connection is killed before the issue occurs: Oct 20 11:55:07 192.168.0.4 [ 114.780323] PM: suspend entry (deep) Oct 20 11:55:07 192.168.0.4 [ 114.786783] Filesystems sync: 0.006 seconds Oct 20 11:55:07 192.168.0.4 [ 115.048760] Freezing user space processes ... (elapsed 0.002 seconds) done. Oct 20 11:55:07 192.168.0.4 [ 115.050887] OOM killer disabled. Oct 20 11:55:07 192.168.0.4 [ 115.050894] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. Oct 20 11:55:07 192.168.0.4 [ 115.052165] wlp4s0: deauthenticating from ac:22:05:e2:26:3a by local choice (Reason: 3=DEAUTH_LEAVING) Oct 20 11:55:07 192.168.0.4 [ 115.066048] sd 0:0:0:0: [sda] Synchronizing SCSI cache Oct 20 11:55:07 192.168.0.4 [ 115.068668] sd 0:0:0:0: [sda] Stopping disk
Created attachment 285573 [details] screenshot of console during the error I have managed to get something displayed on the local console using no_console_suspend and filming it in slow motion. The quality is bad, but one can read that the first error is something like: xhci_hcd WARN xmc cmd_run timeout This would indicate the error I am having might be related to this: https://askubuntu.com/questions/1003015/dell-precision-7510-fails-to-suspend-xhci-hcd-timeout
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1844021 According to git history the possible source of this problem https://lkml.org/lkml/2019/2/7/1028 Was introduced to 5.1.x branch in 5.1.8. This would match with why I was seeing freezes in 5.1.18 but not in 5.0.9. According to the ubuntu bug report, the fix is https://lkml.org/lkml/2019/9/26/581 I will build a kernel including this patch and report back.
Created attachment 285577 [details] better screenshot of console during error I have now tried 5.3.7-300.fc31 which included the timeout extension fix. Unfortunately, the problem is still there. My laptop contains a different USB controller though: 02:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
Created attachment 285579 [details] dmesg output As trying to suspend from tty did not freeze the system with this new kernel, I was able to obtain text logs: [ 381.200817] PM: suspend entry (deep) [ 381.208037] Filesystems sync: 0.007 seconds [ 381.212775] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 381.215164] OOM killer disabled. [ 381.215172] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 381.216871] wlp4s0: deauthenticating from ac:22:05:e2:26:3a by local choice (Reason: 3=DEAUTH_LEAVING) [ 381.227244] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 381.228008] sd 0:0:0:0: [sda] Stopping disk [ 381.806585] xhci_hcd 0000:02:00.0: WARN: xHC CMD_RUN timeout [ 381.806600] PM: suspend_common(): xhci_pci_suspend+0x0/0xd0 returns -110 [ 381.806604] PM: pci_pm_suspend(): hcd_pci_suspend+0x0/0x30 returns -110 [ 381.806609] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x130 returns -110 [ 381.806612] PM: Device 0000:02:00.0 failed to suspend async: error -110 [ 383.877358] PM: Some devices failed to suspend, or early wake event detected [ 383.880034] sd 0:0:0:0: [sda] Starting disk [ 383.900618] jme 0000:03:00.0 enp3s0f0: Link is down [ 384.185026] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 384.185066] ata4: SATA link down (SStatus 0 SControl 300) [ 384.200108] ata3.00: configured for UDMA/133 [ 384.450962] firewire_core 0000:05:00.0: rediscovered device fw0 [ 384.521436] PM: resume devices took 0.644 seconds [ 384.521479] OOM killer enabled. [ 384.521483] Restarting tasks ... done. [ 384.978674] psmouse serio2: synaptics: queried max coordinates: x [..5692], y [..4680] [ 385.416945] video LNXVIDEO:00: Restoring backlight state [ 385.424806] PM: suspend exit
I am going to move this to linux-usb