Bug 105301

Summary: Suspend sometimes fails if mei_me RUNTIME_PM enabled - Zenbook UX303
Product: Power Management Reporter: Maxime Martineau (emixam.agp)
Component: Hibernation/SuspendAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED DUPLICATE    
Severity: normal CC: aaron.lu, lenb, petter3k, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.1.6-1-ARCH Subsystem:
Regression: No Bisected commit-id:

Description Maxime Martineau 2015-10-01 11:13:41 UTC
Running Archlinux on some asus notebook (Zenbook UX303), I noticed an issue occuring when suspending from time to time.

It happens maybe every two days (when suspending multiple times without rebooting or shutting down) or after a long usage of the notebook.

The notebook doesn't complete its transition to the suspended state : some inner parts (possibly the hdd) is still running and I cannot return to the system. I can only perform a hard shut down (10 sec press on the power button.)

The particular aspect of this bug is that it is only occuring when a certain PM option is activated (Runtime Power Management for PCI, RUNTIME_PM_ON_BAT=auto with TLP)
Comment 1 Aaron Lu 2015-10-08 07:41:25 UTC
Maybe also worth reporting this issue to TLP.

BTW, is it possible to know what exactly does TLP do for your laptop?
Comment 2 Zhang Rui 2015-10-26 06:18:19 UTC
(In reply to Maxime Martineau from comment #0)
> Running Archlinux on some asus notebook (Zenbook UX303), I noticed an issue
> occuring when suspending from time to time.
> 
> It happens maybe every two days (when suspending multiple times without
> rebooting or shutting down) or after a long usage of the notebook.
> 
> The notebook doesn't complete its transition to the suspended state : some
> inner parts (possibly the hdd) is still running and I cannot return to the
> system. I can only perform a hard shut down (10 sec press on the power
> button.)
> 
> The particular aspect of this bug is that it is only occuring when a certain
> PM option is activated (Runtime Power Management for PCI,
> RUNTIME_PM_ON_BAT=auto with TLP)

does this problem still exist if the laptop is running with AC plugged in?
Comment 3 Petter Krossbakken 2015-11-11 23:22:29 UTC
My UX32LN (identical to the 303LN) is having this exact issue on both AC and BATTERY mode with TLP enabled.
Comment 4 Aaron Lu 2015-11-12 04:50:58 UTC
Maxime reported this to TLP here:
https://github.com/linrunner/TLP/issues/162

Maxime,

Did you try the TLP developer's suggestions? i.e.

"
Hi, this is a (common) driver/kernel issue. I suggest to:

    Try a newer/older kernel
    Disable completely on battery via RUNTIME_PM_ON_BAT=on
    Blacklist PCIe devices one by one in RUNTIME_PM_BLACKLIST= and re-check to isolate the offending device

[1] http://linrunner.de/en/tlp/docs/tlp-configuration.html#runtimepm
"
Comment 5 Petter Krossbakken 2015-11-12 09:14:08 UTC
Computer still freeze on suspend with newest kernel and regardless of AC/BAT - that is, runtime pm is turned off I assume when AC-mode is activated (either forced or plugged in).
Comment 6 Maxime Martineau 2015-11-12 09:26:36 UTC
Aaron, I am currently changing the RUNTIME_PM blacklist each time the bug is happening. But it takes long to occur. I will keep track of my progress on this ticket and the one on TLP's bug tracker.

Petter, you're right : the RUNTIME_PM is by default deactivated on bat. 

Zhang, I can't say whether the bug is occuring on AC or not. All I can say is that the bug is not occuring when the laptop has been running without RUNTIME_PM from the boot.
Comment 7 Maxime Martineau 2015-11-15 23:20:57 UTC
I blacklisted the discrete graphic card (GeForce 840M) and I achieved a 3 days uptime (3 days without i.e. having the bug while suspending and being bound to reboot)

I will let it a few more days to be sure but this seems to be a good track.

To be noted that this nvidia card is taken care of by bbswitch (which keeps it off most of the time) and bumblebee (which enables it on demand.)
Comment 8 Maxime Martineau 2015-11-19 21:45:22 UTC
In fact the bug has occured again.

I might have understood that it is the hdd that doesn't stop. I blacklisted it and after a few hours and suspends, the laptop doesn't want to suspend anymore. Here is the log it yields :

http://pastebin.com/x26GKiVf
Comment 9 Aaron Lu 2015-11-20 01:47:10 UTC
For some reason, I can't open the pastebin.com website...

And I don't quite understand blacklist here, did you mean to blacklist the driver? But how can you use the system if you blacklist the hard drive's driver?
The same goes to GPU, do you mean you have blacklisted the nvidia GPU's driver?
Comment 10 Maxime Martineau 2015-11-20 07:01:45 UTC
I meant blacklisting the RUNTIME_PM option on these devices (gpu and hdd)

Here is the output :

[37294.452987] PM: Syncing filesystems ... done.
[37294.766167] PM: Preparing system for sleep (freeze)
[37294.766317] bbswitch: enabling discrete graphics
[37295.000885] Freezing user space processes ...
[37315.021714] Freezing of tasks failed after 20.004 seconds (2 tasks refusing to freeze, wq_busy=0):
[37315.021748] brscan-skey-0.2 D ffff88022ef15200     0   687    669 0x00000004
[37315.021757]  ffff8800c9983cb8 0000000000000086 ffff880225692940 ffff8800c983d280
[37315.021763]  ffff8800c9983cb8 ffff8800c9984000 ffff88007fc7c0f4 ffff8800c983d280
[37315.021768]  00000000ffffffff ffff88007fc7c0f8 ffff8800c9983cd8 ffffffff8157283e
[37315.021774] Call Trace:
[37315.021790]  [<ffffffff8157283e>] schedule+0x3e/0x90
[37315.021797]  [<ffffffff81572c25>] schedule_preempt_disabled+0x15/0x20
[37315.021804]  [<ffffffff8157404a>] __mutex_lock_slowpath+0xca/0x140
[37315.021810]  [<ffffffff815740db>] mutex_lock+0x1b/0x30
[37315.021823]  [<ffffffffa0091657>] read_descriptors+0x37/0x100 [usbcore]
[37315.021831]  [<ffffffff8124851a>] sysfs_kf_bin_read+0x4a/0x70
[37315.021836]  [<ffffffff81247c18>] kernfs_fop_read+0xa8/0x160
[37315.021845]  [<ffffffff811d0217>] __vfs_read+0x37/0x100
[37315.021854]  [<ffffffff8126fa4e>] ? security_file_permission+0xae/0xc0
[37315.021860]  [<ffffffff811d0ad7>] vfs_read+0x87/0x130
[37315.021866]  [<ffffffff811d1875>] SyS_read+0x55/0xc0
[37315.021871]  [<ffffffff8157626e>] entry_SYSCALL_64_fastpath+0x12/0x71
[37315.021944] colord-sane     D ffff88022ef15200     0 26480    496 0x00000004
[37315.021950]  ffff880155b3bcb8 0000000000000086 ffff880225692940 ffff880145451b80
[37315.021955]  ffff880155b3bcb8 ffff880155b3c000 ffff88007fc7c0f4 ffff880145451b80
[37315.021960]  00000000ffffffff ffff88007fc7c0f8 ffff880155b3bcd8 ffffffff8157283e
[37315.021965] Call Trace:
[37315.021972]  [<ffffffff8157283e>] schedule+0x3e/0x90
[37315.021977]  [<ffffffff81572c25>] schedule_preempt_disabled+0x15/0x20
[37315.021983]  [<ffffffff8157404a>] __mutex_lock_slowpath+0xca/0x140
[37315.021989]  [<ffffffff815740db>] mutex_lock+0x1b/0x30
[37315.021998]  [<ffffffffa0091657>] read_descriptors+0x37/0x100 [usbcore]
[37315.022003]  [<ffffffff8124851a>] sysfs_kf_bin_read+0x4a/0x70
[37315.022007]  [<ffffffff81247c18>] kernfs_fop_read+0xa8/0x160
[37315.022013]  [<ffffffff811d0217>] __vfs_read+0x37/0x100
[37315.022019]  [<ffffffff8126fa4e>] ? security_file_permission+0xae/0xc0
[37315.022024]  [<ffffffff811d0ad7>] vfs_read+0x87/0x130
[37315.022030]  [<ffffffff811d1875>] SyS_read+0x55/0xc0
[37315.022035]  [<ffffffff8157626e>] entry_SYSCALL_64_fastpath+0x12/0x71
 
[37315.022043] Restarting tasks ... done.
[37315.056003] video LNXVIDEO:00: Restoring backlight state
[37315.056015] video LNXVIDEO:01: Restoring backlight state
[37315.056025] bbswitch: disabling discrete graphics
Comment 11 Aaron Lu 2015-11-20 07:29:26 UTC
(In reply to Maxime Martineau from comment #10)
> I meant blacklisting the RUNTIME_PM option on these devices (gpu and hdd)

Oh yes of course, my brain is damaged..

> 
> Here is the output :
> 
> [37294.452987] PM: Syncing filesystems ... done.
> [37294.766167] PM: Preparing system for sleep (freeze)
> [37294.766317] bbswitch: enabling discrete graphics
> [37295.000885] Freezing user space processes ...
> [37315.021714] Freezing of tasks failed after 20.004 seconds (2 tasks
> refusing to freeze, wq_busy=0):
> [37315.021748] brscan-skey-0.2 D ffff88022ef15200     0   687    669
> 0x00000004

The error here means it is an user space issue, not a kernel one.
The process brscan-skey-0.2 is preventing the system from suspending.
I just googled, it seems to be a config related tool for the Brother Network Wireless Printer?
Anyway, I believe it is a package you have installed for some hardware, the suspend should go well without it.
Comment 12 Maxime Martineau 2015-11-21 11:40:30 UTC
I uninstalled brscan-skey and the bug has come again. I'm trying a suggestion from TLP's developer (disabling some SATA power management option)
Comment 13 Aaron Lu 2015-11-23 02:04:26 UTC
(In reply to Maxime Martineau from comment #12)
> I uninstalled brscan-skey and the bug has come again. I'm trying a

Is the log available when the bug came again?

> suggestion from TLP's developer (disabling some SATA power management option)

Good.
Comment 14 Aaron Lu 2015-12-16 02:50:09 UTC
(In reply to Aaron Lu from comment #13)
> (In reply to Maxime Martineau from comment #12)
> > I uninstalled brscan-skey and the bug has come again. I'm trying a
> 
> Is the log available when the bug came again?

Any update?
Comment 15 Maxime Martineau 2016-01-12 10:10:56 UTC
No bug with 7 days uptime with the following devices running without RUNTIME_PM :

00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
Comment 16 Aaron Lu 2016-01-13 01:58:04 UTC
00:02.0 VGA compatible controller is the graphics card controlled by i915 and 00:1b.0 Audio device is the audio device controlled by snd-*. The 00:16.0 Communication controller shouldn't matter I assume, so you can leave it runtime PM enabled and see if things still go well. If so, try the other two one by one.
Comment 17 Maxime Martineau 2016-02-22 10:17:44 UTC
With the mei_me device without PM, I had no issue since January.





/sys/bus/pci/devices/0000:00:02.0/power/control = auto (0x030000, VGA compatible controller, i915)
/sys/bus/pci/devices/0000:00:03.0/power/control = auto (0x040300, Audio device, snd_hda_intel)
/sys/bus/pci/devices/0000:00:04.0/power/control = auto (0x118000, Signal processing controller, proc_thermal)
/sys/bus/pci/devices/0000:00:14.0/power/control = auto (0x0c0330, USB controller, xhci_hcd)
/sys/bus/pci/devices/0000:00:16.0/power/control = on   (0x078000, Communication controller, mei_me)
/sys/bus/pci/devices/0000:00:1b.0/power/control = auto (0x040300, Audio device, snd_hda_intel)
/sys/bus/pci/devices/0000:00:1c.0/power/control = auto (0x060400, PCI bridge, pcieport)
/sys/bus/pci/devices/0000:00:1c.3/power/control = auto (0x060400, PCI bridge, pcieport)
/sys/bus/pci/devices/0000:00:1c.4/power/control = auto (0x060400, PCI bridge, pcieport)
/sys/bus/pci/devices/0000:00:1f.0/power/control = auto (0x060100, ISA bridge, lpc_ich)
/sys/bus/pci/devices/0000:00:1f.2/power/control = auto (0x010601, SATA controller, ahci)
/sys/bus/pci/devices/0000:00:1f.3/power/control = auto (0x0c0500, SMBus, no driver)
/sys/bus/pci/devices/0000:00:1f.6/power/control = auto (0x118000, Signal processing controller, no driver)
/sys/bus/pci/devices/0000:02:00.0/power/control = auto (0x028000, Network controller, iwlwifi)
/sys/bus/pci/devices/0000:03:00.0/power/control = auto (0x030200, 3D controller, no driver)
Comment 18 Petter Krossbakken 2016-02-22 10:59:42 UTC
Trying this now on my UX32LN. Will report later.


tlp stat:

/sys/bus/pci/devices/0000:00:16.0/power/control = on   (0x078000, Communication controller, mei_me)
Comment 19 Aaron Lu 2016-02-24 02:57:29 UTC
Thanks for the finding and test, I have notified the mei_me's author.

*** This bug has been marked as a duplicate of bug 102091 ***