Running Archlinux on some asus notebook (Zenbook UX303), I noticed an issue occuring when suspending from time to time. It happens maybe every two days (when suspending multiple times without rebooting or shutting down) or after a long usage of the notebook. The notebook doesn't complete its transition to the suspended state : some inner parts (possibly the hdd) is still running and I cannot return to the system. I can only perform a hard shut down (10 sec press on the power button.) The particular aspect of this bug is that it is only occuring when a certain PM option is activated (Runtime Power Management for PCI, RUNTIME_PM_ON_BAT=auto with TLP)
Maybe also worth reporting this issue to TLP. BTW, is it possible to know what exactly does TLP do for your laptop?
(In reply to Maxime Martineau from comment #0) > Running Archlinux on some asus notebook (Zenbook UX303), I noticed an issue > occuring when suspending from time to time. > > It happens maybe every two days (when suspending multiple times without > rebooting or shutting down) or after a long usage of the notebook. > > The notebook doesn't complete its transition to the suspended state : some > inner parts (possibly the hdd) is still running and I cannot return to the > system. I can only perform a hard shut down (10 sec press on the power > button.) > > The particular aspect of this bug is that it is only occuring when a certain > PM option is activated (Runtime Power Management for PCI, > RUNTIME_PM_ON_BAT=auto with TLP) does this problem still exist if the laptop is running with AC plugged in?
My UX32LN (identical to the 303LN) is having this exact issue on both AC and BATTERY mode with TLP enabled.
Maxime reported this to TLP here: https://github.com/linrunner/TLP/issues/162 Maxime, Did you try the TLP developer's suggestions? i.e. " Hi, this is a (common) driver/kernel issue. I suggest to: Try a newer/older kernel Disable completely on battery via RUNTIME_PM_ON_BAT=on Blacklist PCIe devices one by one in RUNTIME_PM_BLACKLIST= and re-check to isolate the offending device [1] http://linrunner.de/en/tlp/docs/tlp-configuration.html#runtimepm "
Computer still freeze on suspend with newest kernel and regardless of AC/BAT - that is, runtime pm is turned off I assume when AC-mode is activated (either forced or plugged in).
Aaron, I am currently changing the RUNTIME_PM blacklist each time the bug is happening. But it takes long to occur. I will keep track of my progress on this ticket and the one on TLP's bug tracker. Petter, you're right : the RUNTIME_PM is by default deactivated on bat. Zhang, I can't say whether the bug is occuring on AC or not. All I can say is that the bug is not occuring when the laptop has been running without RUNTIME_PM from the boot.
I blacklisted the discrete graphic card (GeForce 840M) and I achieved a 3 days uptime (3 days without i.e. having the bug while suspending and being bound to reboot) I will let it a few more days to be sure but this seems to be a good track. To be noted that this nvidia card is taken care of by bbswitch (which keeps it off most of the time) and bumblebee (which enables it on demand.)
In fact the bug has occured again. I might have understood that it is the hdd that doesn't stop. I blacklisted it and after a few hours and suspends, the laptop doesn't want to suspend anymore. Here is the log it yields : http://pastebin.com/x26GKiVf
For some reason, I can't open the pastebin.com website... And I don't quite understand blacklist here, did you mean to blacklist the driver? But how can you use the system if you blacklist the hard drive's driver? The same goes to GPU, do you mean you have blacklisted the nvidia GPU's driver?
I meant blacklisting the RUNTIME_PM option on these devices (gpu and hdd) Here is the output : [37294.452987] PM: Syncing filesystems ... done. [37294.766167] PM: Preparing system for sleep (freeze) [37294.766317] bbswitch: enabling discrete graphics [37295.000885] Freezing user space processes ... [37315.021714] Freezing of tasks failed after 20.004 seconds (2 tasks refusing to freeze, wq_busy=0): [37315.021748] brscan-skey-0.2 D ffff88022ef15200 0 687 669 0x00000004 [37315.021757] ffff8800c9983cb8 0000000000000086 ffff880225692940 ffff8800c983d280 [37315.021763] ffff8800c9983cb8 ffff8800c9984000 ffff88007fc7c0f4 ffff8800c983d280 [37315.021768] 00000000ffffffff ffff88007fc7c0f8 ffff8800c9983cd8 ffffffff8157283e [37315.021774] Call Trace: [37315.021790] [<ffffffff8157283e>] schedule+0x3e/0x90 [37315.021797] [<ffffffff81572c25>] schedule_preempt_disabled+0x15/0x20 [37315.021804] [<ffffffff8157404a>] __mutex_lock_slowpath+0xca/0x140 [37315.021810] [<ffffffff815740db>] mutex_lock+0x1b/0x30 [37315.021823] [<ffffffffa0091657>] read_descriptors+0x37/0x100 [usbcore] [37315.021831] [<ffffffff8124851a>] sysfs_kf_bin_read+0x4a/0x70 [37315.021836] [<ffffffff81247c18>] kernfs_fop_read+0xa8/0x160 [37315.021845] [<ffffffff811d0217>] __vfs_read+0x37/0x100 [37315.021854] [<ffffffff8126fa4e>] ? security_file_permission+0xae/0xc0 [37315.021860] [<ffffffff811d0ad7>] vfs_read+0x87/0x130 [37315.021866] [<ffffffff811d1875>] SyS_read+0x55/0xc0 [37315.021871] [<ffffffff8157626e>] entry_SYSCALL_64_fastpath+0x12/0x71 [37315.021944] colord-sane D ffff88022ef15200 0 26480 496 0x00000004 [37315.021950] ffff880155b3bcb8 0000000000000086 ffff880225692940 ffff880145451b80 [37315.021955] ffff880155b3bcb8 ffff880155b3c000 ffff88007fc7c0f4 ffff880145451b80 [37315.021960] 00000000ffffffff ffff88007fc7c0f8 ffff880155b3bcd8 ffffffff8157283e [37315.021965] Call Trace: [37315.021972] [<ffffffff8157283e>] schedule+0x3e/0x90 [37315.021977] [<ffffffff81572c25>] schedule_preempt_disabled+0x15/0x20 [37315.021983] [<ffffffff8157404a>] __mutex_lock_slowpath+0xca/0x140 [37315.021989] [<ffffffff815740db>] mutex_lock+0x1b/0x30 [37315.021998] [<ffffffffa0091657>] read_descriptors+0x37/0x100 [usbcore] [37315.022003] [<ffffffff8124851a>] sysfs_kf_bin_read+0x4a/0x70 [37315.022007] [<ffffffff81247c18>] kernfs_fop_read+0xa8/0x160 [37315.022013] [<ffffffff811d0217>] __vfs_read+0x37/0x100 [37315.022019] [<ffffffff8126fa4e>] ? security_file_permission+0xae/0xc0 [37315.022024] [<ffffffff811d0ad7>] vfs_read+0x87/0x130 [37315.022030] [<ffffffff811d1875>] SyS_read+0x55/0xc0 [37315.022035] [<ffffffff8157626e>] entry_SYSCALL_64_fastpath+0x12/0x71 [37315.022043] Restarting tasks ... done. [37315.056003] video LNXVIDEO:00: Restoring backlight state [37315.056015] video LNXVIDEO:01: Restoring backlight state [37315.056025] bbswitch: disabling discrete graphics
(In reply to Maxime Martineau from comment #10) > I meant blacklisting the RUNTIME_PM option on these devices (gpu and hdd) Oh yes of course, my brain is damaged.. > > Here is the output : > > [37294.452987] PM: Syncing filesystems ... done. > [37294.766167] PM: Preparing system for sleep (freeze) > [37294.766317] bbswitch: enabling discrete graphics > [37295.000885] Freezing user space processes ... > [37315.021714] Freezing of tasks failed after 20.004 seconds (2 tasks > refusing to freeze, wq_busy=0): > [37315.021748] brscan-skey-0.2 D ffff88022ef15200 0 687 669 > 0x00000004 The error here means it is an user space issue, not a kernel one. The process brscan-skey-0.2 is preventing the system from suspending. I just googled, it seems to be a config related tool for the Brother Network Wireless Printer? Anyway, I believe it is a package you have installed for some hardware, the suspend should go well without it.
I uninstalled brscan-skey and the bug has come again. I'm trying a suggestion from TLP's developer (disabling some SATA power management option)
(In reply to Maxime Martineau from comment #12) > I uninstalled brscan-skey and the bug has come again. I'm trying a Is the log available when the bug came again? > suggestion from TLP's developer (disabling some SATA power management option) Good.
(In reply to Aaron Lu from comment #13) > (In reply to Maxime Martineau from comment #12) > > I uninstalled brscan-skey and the bug has come again. I'm trying a > > Is the log available when the bug came again? Any update?
No bug with 7 days uptime with the following devices running without RUNTIME_PM : 00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b) 00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04) 00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
00:02.0 VGA compatible controller is the graphics card controlled by i915 and 00:1b.0 Audio device is the audio device controlled by snd-*. The 00:16.0 Communication controller shouldn't matter I assume, so you can leave it runtime PM enabled and see if things still go well. If so, try the other two one by one.
With the mei_me device without PM, I had no issue since January. /sys/bus/pci/devices/0000:00:02.0/power/control = auto (0x030000, VGA compatible controller, i915) /sys/bus/pci/devices/0000:00:03.0/power/control = auto (0x040300, Audio device, snd_hda_intel) /sys/bus/pci/devices/0000:00:04.0/power/control = auto (0x118000, Signal processing controller, proc_thermal) /sys/bus/pci/devices/0000:00:14.0/power/control = auto (0x0c0330, USB controller, xhci_hcd) /sys/bus/pci/devices/0000:00:16.0/power/control = on (0x078000, Communication controller, mei_me) /sys/bus/pci/devices/0000:00:1b.0/power/control = auto (0x040300, Audio device, snd_hda_intel) /sys/bus/pci/devices/0000:00:1c.0/power/control = auto (0x060400, PCI bridge, pcieport) /sys/bus/pci/devices/0000:00:1c.3/power/control = auto (0x060400, PCI bridge, pcieport) /sys/bus/pci/devices/0000:00:1c.4/power/control = auto (0x060400, PCI bridge, pcieport) /sys/bus/pci/devices/0000:00:1f.0/power/control = auto (0x060100, ISA bridge, lpc_ich) /sys/bus/pci/devices/0000:00:1f.2/power/control = auto (0x010601, SATA controller, ahci) /sys/bus/pci/devices/0000:00:1f.3/power/control = auto (0x0c0500, SMBus, no driver) /sys/bus/pci/devices/0000:00:1f.6/power/control = auto (0x118000, Signal processing controller, no driver) /sys/bus/pci/devices/0000:02:00.0/power/control = auto (0x028000, Network controller, iwlwifi) /sys/bus/pci/devices/0000:03:00.0/power/control = auto (0x030200, 3D controller, no driver)
Trying this now on my UX32LN. Will report later. tlp stat: /sys/bus/pci/devices/0000:00:16.0/power/control = on (0x078000, Communication controller, mei_me)
Thanks for the finding and test, I have notified the mei_me's author. *** This bug has been marked as a duplicate of bug 102091 ***