Running Arch Linux x64 on a Asus ultrabook with kernels later than the current LTS kernel (3.14) causes laptop to randomly freeze when suspending. This happens about 60-70% of the times the laptop is put to sleep, and the screen turns black, fans going at 100% and a hard shutdown is required. This also happened from time to time on my older Asus UX51Vz, both with latest bios revisions from Asus.
HW specs: Intel Core i7-4510U, 8GB RAM, Intel HD graphics with NVIDIA Optimus(840M), Intel 7260 AC Wifi/Bluetooth.
Please follow this document to do some debugging: https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt Also, this looks like a regression. It's not clear from your description so I have to ask: is it that the problem starts to occur from v3.15? What about the latest upstream kernel, v4.1?
This seems to be about wether or not I am closing the lid fast or slow. On kernels later than 3.14, closing the lid "slow", the computer freezes on suspend (one can tell because the power-LED is not blinking, computer screen goes black and fans racing). Closing the lid slow on kernel =3.14, no problem occurs and the computer suspends. This is happening on kernels >3.14 up to =4.0.9. I have not tried kernels later than 4.0.9. I am aware that this is not really an easy problem to investigate into for you guys, however I thought it's better to submit a bug on it in case others experience similar problems.
What if you issue the following command to enter suspend? # echo mem > /sys/power/state
ping
This works 10/10 times with the following kernel: echo mem > /sys/power/state uname -r 4.1.6.1-ck
I don't have any idea why the "slow" move of lid would cause this problem, perhaps you can do a git bisect between the good(v3.14) and bad(v3.15) kernel to find out the offending commit?
I will look into it. I have also read others having the same issues on this particular laptop aswell. UX32LN and UX303LN is 99% the same. https://bbs.archlinux.org/viewtopic.php?id=194413
To be more specific, this bug occurs either after a long duration use or after a big number of suspend. Generally speaking, the probability for this bug to happen increases along time from the moment you boot the computer. Also - on a UX303LN - I was able to prevent that bug to occur by disabling a PM option relative to PCI (Runtime Power Management for PCI) If you use TLP, the corresponding option is RUNTIME_PM_ON_BAT.
Hi Maxime, Just to make sure, do you also have the suspend problem caused by slow move of the LID?
Hi Aaron, I'll check this week end. Also, I am on 4.1.4-1-ARCH but I'm not sure I can reproduce the bug on this kernel.
Hi Aaron, Still the same problem on 4.1.4 when RUNTIME_PM_ON_BAT=auto. But I couldn't make it bug when closing slowly the lid.
Hi Maxime, Then you have a different problem, please file a new bug about yours.
Just want to chime in here. I have an Asus UX303LN with the same specs as Petter K. and I *had* problems suspending, it was however not related to the speed with which I closed the lid. I had this problem on kernels 3.13, 3.16 and 4.1.0 and there would be not pattern as to when it would occur. What solved it for me, was to uninstall tlp. After removing tlp I have dozens of successful suspend/resume cycles and I can again achieve weeks of uptime on my laptop without problems. In order to confirm that it is some setting designed to save power that is the cause of the problem, I used powertop to enable all power saving features and boom! suspend failed again. Checking each power saving feature individually will hopefully identify the culprit but I don't have time for that kind of testing right now, but will try and get around to it later.
Hi J. Alexander, Here are a few tickets about my case and probably yours : https://bugzilla.kernel.org/show_bug.cgi?id=105301 https://github.com/linrunner/TLP/issues/162
Thanks for the info Maxime. Meanwhile, it doesn't seem we can do anything about the slow lid close problem, unless someone is willing to do the git bisect(according to comment #3, this only occurs on kernels > v3.14), so I'll close it for now. If anyone has some new findings, please re-open it.
Hi Aaron. It seems sometimes it doesnt matter wether the lid is closed slow or fast anymore. Running latest kernel for Arch. Still freeze and fans rev up. This happens regardless of AC/BAT and not only when closing the lid slow. Sorry for the mistake.
(In reply to Petter K. from comment #17) > Hi Aaron. > > It seems sometimes it doesnt matter wether the lid is closed slow or fast > anymore. Running latest kernel for Arch. > > Still freeze and fans rev up. Does the suspend always fail or sometimes? > > This happens regardless of AC/BAT and not only when closing the lid slow. > Sorry for the mistake. Do you have TLP enabled? If yes, please disable or remove it. This bug here doesn't deal with TLP related issues.
I have TLP enabled, but TLP is disabled when put in AC-mode, atleast the buggy problems with PCI power management and runtime_pm, and the bug still occurs SOMETIMES, but I can try to uninstall the whole TLP stuff, but this also happened before I switched to TLP from laptop-mode-tools.
If you do not use TLP and laptop-mode-tools, does the problem still occur?
(In reply to Aaron Lu from comment #20) > If you do not use TLP and laptop-mode-tools, does the problem still occur? Sometimes, yes. TLP is disabled in AC-mode (atleast most of the options.) I can try to disable it completely.
Just wanted to say that I have exactly the same problem on UX301LA. Freezes happen mostly after a period of heavy load, usually, everything works great for a while after restart. When the problem occurs, the power led will start blinking as in case of working suspend, but the fan will still be running. Then pressing a key will make the power led stop blinking, but the laptop will not wake up.
Hello, I'm also having the described problem; the description of Andrzej couldnt be better. The system freezes after a longer usage time and some standby/resume cycles. Sometimes even after the first try. Running Ubuntu 15.10: Linux zen 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux and latest BIOS (211) on UX301LA.
I experimented with disabling Runtime Power Management and it seems to make a huge difference. Previously I used Laptop Mode Tools or TLP and this problem occurred. Since I disabled runtime PM for PCI devices (in TLP: RUNTIME_PM_ON_BAT=on), I didn't have a single freeze. It remains to be seen which device caused the problem.
Thanks for that. I also didn't experience a freeze since disabling PM for PCI devices. Wasn't using TLP or laptop-mode-tools but mis-tuned these settings manually via powertop...
Please find out which PCI device's runtime PM caused this issue, thanks.
So far I know that after enabling PM for these three devices, the problem returned: 00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04) 00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04) 02:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b) Will investigate further.
I only have PM for 02:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b) disabled and the problem did not occur on my UX301LA since a few days.
I did exactly the same and can also confirm no freezes over the last couple of days.
Then I'll move the bug to Drivers/Wireless.
BTW, which driver module is it for that network wireless controller? lspci -v should tell you that: Kernel driver in use: XXX
it is iwlwifi.
Add Intel Linux Wireless <linuxwifi@intel.com> (supporter:INTEL WIRELESS WIFI LINK (iwlwifi))
Hi :) I maintain iwlwifi. I am not exactly an expert in Runtime PM, but iwlwifi doesn't support it (yet): static SIMPLE_DEV_PM_OPS(iwl_dev_pm_ops, iwl_pci_suspend, iwl_pci_resume); #define IWL_PM_OPS (&iwl_dev_pm_ops) #else #define IWL_PM_OPS NULL #endif static struct pci_driver iwl_pci_driver = { .name = DRV_NAME, .id_table = iwl_hw_card_ids, .probe = iwl_pci_probe, .remove = iwl_pci_remove, .driver.pm = IWL_PM_OPS, }; So I wonder how iwlwifi could be the culprit? Has someone tried to get logs through netconsole or maybe that can be reproduced in a VM?
It sounds like a runtime PM and system PM inter operate problem. The system suspend of iwlwifi is OK if the runtime PM is not turned on for the network controller, but something went wrong otherwise.
I see. We are now working on runtime PM enablement for iwlwifi and things seem to work. I am not sure how I can help here unfortunately...
If it is a driver related issue, read the below document: https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt And then do the devices and platform levels debug should trigger some error: 1 boot system normally, do not use TLP, but enable runtime PM for the wireless network controller(I assume this setup should cause system suspend freeze?); 2 do the test like this as root: # cd /sys/power # echo devices > pm_test # echo mem > state Wait till the system is back by itself, check dmesg to see if there is anything wrong, like XXX device fail to suspend, etc. You may need to do this multiple times to trigger some error. If still no problem, continue to the next level "platform": # echo platform > pm_test # echo mem > state and redo the check.
Hello, I experienced two further freezes after enabling PM for all PCI devices except the wifi card: echo 'auto' > '/sys/bus/pci/devices/0000:00:1f.2/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:02.0/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:04.0/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:14.0/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:16.0/power/control'; # echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:1f.3/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:1f.0/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:1c.3/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:1c.0/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:03.0/power/control'; echo 'auto' > '/sys/bus/pci/devices/0000:00:1b.0/power/control'; It sometime takes up to five days for the bug to occur. I had the experience that it disappeared after changing any of the PM settings, but after some days of uptime freezed again spontanously. :/ Will be hard to figure out which one / or comibination is causing this. Still trying. Definitely I had best results till now without PM at all.
So I understand that this is not related to WiFi, right? Aaron, can you please take the bug back to the original component?
Sorry, i noticed that i have pm disabled for the wifi adapter and for 00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 09) Kernel driver in use: hsw_uncore I will enable PM for the wifi device and tell you in a few days if the crashes come back.
By following Aarons PM debugging description I could not find any strange messages in dmesg or even provoke a freeze. But as I already said, I think we're all testing much too less. Sometimes the freeze happens after the 50th suspend cycle or so. Randomly... But.. after the "platform" tests, the whole keyboard backlight keeps blinking in the sync with the power-led, which of course should have stopped blinking after wakeup, but instead it carries off the whole keyboard to blink with it ;) Had PM enabled for all devices. That stops after pressing any key. Somewhat strange. The wifi card also seems to have issues after suspend (sometimes), as you have to restart the network-manager to properly reconnect. But I'm not sure if this isn't a Ubuntu/network-manager specific bug. Here some logfiles from this morning, when I got waked up, because "the Internet isn't working...". Dec 31 07:14:09 zen NetworkManager[1266]: <info> wake requested (sleeping: yes enabled: yes) Dec 31 07:14:09 zen systemd[1]: Started Run anacron jobs at resume. Dec 31 07:14:09 zen NetworkManager[1266]: <info> waking up... Dec 31 07:14:09 zen NetworkManager[1266]: <info> (wlan0): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2] Dec 31 07:14:10 zen NetworkManager[1266]: <info> NetworkManager state is now DISCONNECTED Dec 31 07:14:10 zen wpa_supplicant[1406]: dbus: wpa_dbus_get_object_properties: failed to get object properties: (none) none Dec 31 07:14:10 zen wpa_supplicant[1406]: dbus: Failed to construct signal Dec 31 07:14:10 zen wpa_supplicant[1406]: Could not read interface p2p-dev-wlan0 flags: No such device Dec 31 07:14:10 zen NetworkManager[1266]: <info> (wlan0): supplicant interface state: starting -> ready Dec 31 07:14:10 zen NetworkManager[1266]: <info> (wlan0): device state change: unavailable -> disconnected (reason 'supplicant-available') [20 30 42] Dec 31 07:14:10 zen NetworkManager[1266]: <info> Device 'wlan0' has no connection; scheduling activate_check in 0 seconds. Restart of network-manager helps most times, a second suspend won't. Sometimes only reboot. Dec 31 10:01:50 zen systemd[1]: Stopping Network Manager... Dec 31 10:01:50 zen NetworkManager[1266]: <info> caught SIGTERM, shutting down normally. Dec 31 10:01:50 zen NetworkManager[1266]: <info> (wlan0): device state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3] Dec 31 10:01:50 zen NetworkManager[1266]: <info> exiting (success) Dec 31 10:01:50 zen wpa_supplicant[1406]: nl80211: deinit ifname=p2p-dev-wlan0 disabled_11b_rates=0 Dec 31 10:01:50 zen systemd[1]: Stopped Network Manager. Dec 31 10:01:50 zen systemd[1]: Starting Network Manager... Dec 31 10:01:50 zen NetworkManager[5509]: <info> NetworkManager (version 1.0.4) is starting... Dec 31 10:01:50 zen NetworkManager[5509]: <info> Read config: /etc/NetworkManager/NetworkManager.conf Dec 31 10:01:50 zen systemd[1]: Started Network Manager. Dec 31 10:01:50 zen NetworkManager[5509]: <info> VPN: loaded org.freedesktop.NetworkManager.pptp Dec 31 10:01:50 zen NetworkManager[5509]: <info> DNS: loaded plugin dnsmasq Dec 31 10:01:50 zen NetworkManager[5509]: <info> init! Dec 31 10:01:50 zen NetworkManager[5509]: <info> update_system_hostname Dec 31 10:01:50 zen NetworkManager[5509]: <info> interface-parser: parsing file /etc/network/interfaces Dec 31 10:01:50 zen NetworkManager[5509]: <info> interface-parser: finished parsing file /etc/network/interfaces Dec 31 10:01:50 zen NetworkManager[5509]: <info> management mode: unmanaged Dec 31 10:01:50 zen NetworkManager[5509]: <info> devices added (path: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/net/wlan0, iface: wlan0) Dec 31 10:01:50 zen NetworkManager[5509]: <info> device added (path: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/net/wlan0, iface: wlan0): no ifupdown configuration found. Dec 31 10:01:50 zen NetworkManager[5509]: <info> devices added (path: /sys/devices/virtual/net/lo, iface: lo) Dec 31 10:01:50 zen NetworkManager[5509]: <info> device added (path: /sys/devices/virtual/net/lo, iface: lo): no ifupdown configuration found. Dec 31 10:01:50 zen NetworkManager[5509]: <info> end _init.
Before the problem with the wifi device/network-manager occurs, I could find following: Dec 31 01:42:25 zen systemd-udevd[4642]: Process '/sbin/crda' failed with exit code 249. That happened during suspend.
Just tried to enable netconsole to see what happens when it occurs. Unfortunately I can't get configured: [ 143.101399] netpoll: netconsole: wlan0 doesn't support polling, aborting The USB-LAN adapter I have seems to have the same problem: [ 595.435335] netpoll: netconsole: enx0023574c2122 doesn't support polling, aborting So no luck on that. "tail"-ling logfiles via ssh doesn't work either, as the messages of the suspend process get sent after wake-up, which probably won't work in case of freeze, but I'll still try, maybe the wifi connection still gets up. To my experience with pm debugging: It was stable last two 2 days I left "platform" in "pm_test". After changing back to "none", it just freezed after the first cycle. Again, had PM for all PCI devices enabled.
If platform test mode can not trigger problem, you may try further: processors and core
I can verify that disabling runtime_pm through TLP on a UX32LN with exact same specs as UX303LN has solved the issue with the computer being frozen at suspend, so something has to do with it (TLP). This is with iwlwifi too, but disabled RPM on all devices.
BorisH mentioned that device 00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 09) When I read his post, I also checked PM settings for it but there was no such device found on my UX301LA. Today its here. So it seems that device sometimes disappears (at least from lspci listing). I just had a crash, after rebooting to see if the device reappears after reboot (answer: yes). Before suspending last time, the device still existed. Before that reboot, the device was missing and I had been testing around with PM enabled on all remaining PCI devices, unable to produce a freeze for 2 days. Seems difficult to get in a state where it is missing (without freeze). It takes many (good) suspend cycles. Hopefully I'll be there again soon to grab some logs or so.
Its the same for me too. lspci does not always show the device, that's why i didn't mentioned it in my first post.
Okay. I wrote a little script to notify me when that device goes away.. And just had some success. But still no idea what happens to PCI device 00:00.0 or is causing the freezes. That device got away after UX301LA was suspended for about 9h. Immediately after waking it up. I found some messages in dmesg concerning the graphics card. I've attached the output including two suspend cycles. Strange things happen from there on: [ 8547.160634] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment Anyway, it survived the wakeup, even if PM was enabled for all devices again. I have no clue if this error has anything to do with the freezes. Also, I just noticed the following message appearing sometimes during wakeup: [10142.635421] xhci_hcd 0000:00:14.0: port 6 resume PLC timeout There are no connected USB devices.
Created attachment 199451 [details] dmesg output when pci device 00:00.0. disappeared and i915 errors occured. (at 8547)
Could someone test if the crashes stop when adding the device: 00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04) At least with these devices on the list the crashes did still occur: 00:00.0 00:1f.3 00:1f.2 00:1f.0 00:1c.3 00:1c.0 00:1b.0 After adding the Communication controller to this list i hadn't had a crash since a few days.
(In reply to BorisH from comment #50) > Could someone test if the crashes stop when adding the device: > > 00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04) > > At least with these devices on the list the crashes did still occur: > > 00:00.0 00:1f.3 00:1f.2 00:1f.0 00:1c.3 00:1c.0 00:1b.0 > > After adding the Communication controller to this list i hadn't had a crash > since a few days. By adding i meant putting it on the exclusion list of tlp.
I have had no crashes after blacklisting the "mei_me"-device in the exclusion list in /etc/default/tlp on my UX32LN so far. Normally my PC would have had atleast a couple of crashes because of frequent suspends.
Adding maintainer of this driver: Tomas Winkler Hi Tomas, People are suffering problems when enabling runtime PM for PCI device mei_me.
*** Bug 105301 has been marked as a duplicate of this bug. ***
I'm looking into this, so far I see this is only relevant to the haswell generation on kernels 3.14-4.2.0. Has anybody tried that on the latest kernel.
I am running 4.3.6 and have blacklisted the runtime_pm for the device because of freezes.
Without blacklisting the freezes also occur on kernel 4.4.1-2.
I also did not have single freeze on my UX301LA since I re-enabled PM for all PCI devices except the "mei_me" one: 00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04)
Created attachment 206401 [details] get MEI fw version
I couldn't reproduce the issue on my systems so I would like to get the MEI firmware version of this particular platform. The version is either available from the BIOS menu or one can use a simple script getver-sa.py
ME Code Firmware Version: 9.5.20.1742 ME NFTP Firmware Version: 9.5.20.1742 ME FITC Firmware Version: 9.5.15.1730 on a Asus UX32LN (identical to the UX303LN but with lower screen resolution).
From my UX301LA: ME Code Firmware Version: 9.5.30.1808 ME NFTP Firmware Version: 9.5.30.1808 ME FITC Firmware Version: 9.5.15.1730
From my UX301L: ME Code Firmware Version: 9.5.3.1520 ME NFTP Firmware Version: 9.5.3.1520 ME FITC Firmware Version: 9.5.3.1520
From my UX303LN : ME Code Firmware Version: 9.5.20.1742 ME NFTP Firmware Version: 9.5.20.1742 ME FITC Firmware Version: 9.5.15.1730
Thanks for posting, we've cannot reproduce the issue on our setup with those particular fw versions, also we do not have report from other vendors. so I believe the issue is somehow connected to specifically to the vendor, It will take time till I will have access to such latptop, so I would be glad if someone can supply some more debug info. More verbose logging can be enabled by echo -n 'module mei +lfp' > /sys/kernel/debug/dynamic_debug/control echo -n 'module mei_me +lfp' > /sys/kernel/debug/dynamic_debug/control Register tracing can be enabled by echo 1 > /sys/kernel/debug/tracing/events/mei/enable cat /sys/kernel/debug/tracing/trace I hope that something relevant ca be cached during suspend resume cycles
Created attachment 208911 [details] mei_me debug from syslog
reassign to mei_me expert.
(In reply to Manuel D. from comment #66) > Created attachment 208911 [details] > mei_me debug from syslog Did you hit the freeze here, as in quick glance the trace in the syslog looks like a good run.?
Yes, this was a good run. I don't think I'll be cable to catch the output for a bad run, as the laptop will freeze completely. I tried netconsole with the wifi and an usb ethernet dongle. Both devices were not supported for that. Maybe I can try to redirect the kernel's serial to an USB to serial adapter? Recording with another device.
Can't be captured in the pstore ? https://www.kernel.org/doc/Documentation/ABI/testing/pstore
Hopefully. What a great "feature" of the kernel. I didn't know this was possible. Unfortunately CONFIG_PSTORE_CONSOLE is not enabled in Ubuntu kernels. Just compiling one with it enabled. Found a nice article on stackoverflow how to test if pstore logging works. May take some more days for the bug to appear then.
I tried to enable the pstore, but it doesn't seem to be working (at least not according to the testing procedure I found). Could someone assist me with getting it working? Some more details are in my stackoverflow post; http://unix.stackexchange.com/questions/273352/how-to-enable-kernel-pstore
I was finally was able to get the hands on UX303LN. Installed Ubuntu 16.04 LTS (kernel 4.4.0-21) and added TLP package, enabled the runtime PM also on AC but so far I was not able to reproduce the issue. With ME FW verionn ME Code Firmware Version: 9.5.20.1742 ME NFTP Firmware Version: 9.5.20.1742 ME FITC Firmware Version: 9.5.15.1730 BIOS revision 4.6 I will try to stress this more the ways described in the report, if someone has reliable method of reproduction please let me know. Thanks
I also upgraded to 16.04 in the meanwhile. Just re-enabled PM for the suspected device to see if it happens again. There is no reliable way to reproduce this. I think its a good indicator for a coming freeze when the device disappears as I descirbed earlier. Whenever I tried to reproduce it by, it didn't happen at all. Uptime and activity seems to increase the chance for a freeze. I recommend normal usage. Don't forget to save things before closing the lid.
Just had a freeze after re-enabled PM for the mei_me device. During resume on battery after charging. Linux zen 4.4.0-22-generic #39-Ubuntu SMP Thu May 5 16:53:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
This setup worked for us to enable pstore on Ubuntu 16.04 LTS 1. edit /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT: mem=2G ramoops.mem_address=0x80000000 ramoops.mem_size=0x1000000 ramoops.ramoops_ecc=1 grub-update 2. ramoops at the last line of /etc/modules 3. stack trace can be found /sys/fs/pstore after boot. Still was not able to reproduce the issue, though
Followed your instructions to enable pstore and tested by triggering a kernel panic via echo c > /proc/sysrq-trigger That worked. After two further days I just had a freeze after opening the lid, but nothing in /sys/fs/pstore this time. I'll leave it running this way waiting for the next freeze.
Finally, today we also hit the issue, and also pstore was empty. Meaning there is no kernel panic happening. We need to device a different way of capturing what is happening.
Additional information : the hard drive is shut down during the bug. Is only left the fan to turn.
I can reproduce this bug on a Sony Vaio Pro 13 which I believe has the identical Haswell chipset. I have used this machine with Arch Linux and without this issue for over a year and it just started happening a few months ago. Proving really hard to debug.
Hello there. I've struck the similar problem on thinkpad x230 with coreboot (inxi log in attachment) The first suspend is always ok; the second one is always freezing. Unloading all mei* modules helps Using a 120fps camera I was able to make a photo of a kernel log (pic attached), a short message of it (flag values are probably not correct): genirq: Flags mismatch irq 0: 00000080 (mei_me) vs 08915a00 (timer) mei_me: ... request_threading_irq failed: irq: 0 dpm_run_callback: pci_pm_resume+0x070xa0 returns -16 PM: Device 0000:00:16.0 failed to resume async: error -16 BUG: Unable to handle kernel NULL pointer deference at ... OOUPS ... Call trace: mei_me_pci_suspend+0x4d ...
Created attachment 258543 [details] inxi device log
Created attachment 258545 [details] ooups picture
(In reply to derlafff@ya.ru from comment #83) > Created attachment 258545 [details] > ooups picture What kernel version are you running, Can you please provide dmidecode output
Created attachment 258705 [details] dmidecode output Kernel is Linux 4.13.3-1-ARCH
This part of kernel log happens after a first suspend, so I am able to retrieve it: [ 560.723960] mei_me 0000:00:16.0: Refused to change power state, currently in D3 [ 560.784691] genirq: Flags mismatch irq 0. 00000080 (mei_me) vs. 00015a00 (timer) [ 560.784754] mei_me 0000:00:16.0: request_threaded_irq failed: irq = 0.
Created attachment 258707 [details] dmesg after first suspend
If this is 4.12 then you should apply this patch, it's still not applied https://patchwork.kernel.org/patch/9967845/
I am experiencing this issue on an ASUS ZENBOOK UX301LAA (BIOS 211 06/05/2015). In my case, the issue is extremely repeatable: I can do 'echo mem > /sys/power/state' and wakeup exactly one time and have it work as expected without having to power cycle the system to make suspend work reliably again. If I try to do 'echo mem > /sys/power/state' a second time, the laptop appears to suspend but the fans keep spinning. The light on the keyboard pulses as it normally would in suspend during this time. When depressing a key on the keyboard in this state (to exit S3), the keyboard light turns solid white to indicate power, but the screen nor the network functionality never come back, so I am unable to debug. Furthermore, if I do enter/exit S3 exactly once and then reboot/shutdown, the system will hang right before the system effectively reboots/goes back to POST and I will have to manually power cycle it anyways. I am not running tlp or any kind of software which alters runtime-pm settings in any way. This behavior happens with a bore-stock Debian 9 install (also happened with backports kernel 4.16.x which contains the patch Tomas provided above). I have tried so many permutations of BIOS settings, cleaning up my DSDT tables, different kernels, etc... but the result is always the same: I can only suspend one time in a reliable fashion. --- I read over this ticket and noticed Aaron's pm_test suggestion. When the system comes back from suspend (and thus is in the 'wonky' state) I did: echo devices > /sys/power/pm_test echo mem > /sys/power/state The system did suspend test successfully enough though it was second time, but GPU locked up and my X session crashed. Fortunately, system was responsive enough to allow me to collect info: This maybe gives a hint that either GPU or mei is the problem? I have i915 power optimizations turned on now; will try with turned off if needed/go back to stock... the mei part is worrisome though. [ 974.039189] Restarting tasks ... done. [ 977.546063] wlp2s0: authenticate with ... <OMITTED FOR PRIVACY> [ 977.565371] wlp2s0: associated [ 985.449148] [drm] GPU HANG: ecode 7:0:0x87c3ffff, in Xorg [549], reason: Hang on render ring, action: reset [ 985.449149] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 985.449150] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 985.449150] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 985.449150] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 985.449151] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 985.449177] drm/i915: Resetting chip after gpu hang [ 997.416363] drm/i915: Resetting chip after gpu hang [ 1003.816187] mei_me 0000:00:16.0: timer: init clients timeout hbm_state = 2. [ 1003.816201] mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS fw status = 1E000045 60002106 00000200 00004400 00000000 40000010 [ 1034.055911] mei_me 0000:00:16.0: timer: init clients timeout hbm_state = 2. [ 1034.055937] mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS fw status = 1E000045 60002106 00000200 00004400 00000000 40000010 [ 1064.295470] mei_me 0000:00:16.0: timer: init clients timeout hbm_state = 2. [ 1064.295484] mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS fw status = 1E000045 60002106 00000200 00004400 00000000 40000010 [ 1064.295485] mei_me 0000:00:16.0: reset: reached maximal consecutive resets: disabling the device
Sorry for double-posting. GPU hang is only evident with 4.9 kernel: it is gone with 4.16.5... so progress is being made! mei messages/issues after suspending once and subsequently fiddling with pm_test persist, but now I must use 'freezer' for pm_test rather than 'devices' to reliably trigger the same mei error messages seen above.
I finally got suspend working reliably on this platform (and with runtime-pm to boot!). I had a a suspicion about that funny ME activity above and followed through on it... I 'only' had to temporarily nuke my entire system, install Windows 8, and flash the latest version of Intel ME firmware using a package from another vendor (because Intel doesn't provide images directly and my vendor has not released an update in years). I'll stop my ranting now ;) -- but you wonder why people hate this bloody ME thing... If you're looking to repro the issue, I am including the firmware versions below. After flashing, the entire platform and all power saving features work perfectly. Before: ME Code Firmware Version: 9.5.3.1520 ME NFTP Firmware Version: 9.5.3.1520 ME FITC Firmware Version: 9.5.3.1520 After: ME Code Firmware Version: 9.5.60.1952 ME NFTP Firmware Version: 9.5.60.1952 ME FITC Firmware Version: 9.5.3.1520
I did not have a single freeze over the past two years with PM enabled for all devices except this (like described in comment 58): # echo 'auto' > '/sys/bus/pci/devices/0000:00:16.0/power/control'; # mei_me Almost forgot that commented line and this issue... But I just tested if I could still trigger the issue when (re-)activating PM for it - and yes, it happened twice within two days. It only happened when closing the lid - standby doesn't seem to trigger it. Running Ubuntu 18.04 (kernel 4.15.0-23-generic) today and still have the same ME firmwares as in comment 62. @Tyler S Could you tell me which different vendor package you've used? I'd like to confirm if the upgrade you've described also fixes it for me. At least we have the exact same model. And I already have Windows (10) installed in parallel.
@Manuel D: I have Asus Zenbook UX32LN and had been suffering from the "suspend and hibernate are unreliable" problem for years now. Problem had been occuring under Debian Jessie, Stretch and now under Buster. Also tried Arch, Ubuntu, Fedora - all the same. Currently I'm using LMDE 3 Cindy (based on Debian Stretch) with stock kernel being 4.9.0-8-amd64 and last backported kernel being 4.18.0-2-amd64. Using this laptop under linux was a painful experience due to some long standing bugs. Brightness keys not working was fixed only in 4.10 while unstable suspend/resume is still not fixed. Today I had stumbled upon this bug and with a hint from @Tyler S I was able to update ME engine firmware to a more recent version. Hint was to find a Lenovo notebook model with a similar CPU and chipset and download Intel ME FW update for it. WARNING: do it on your own risk, you might brick your laptop. I had used files from here: https://pcsupport.lenovo.com/ru/ru/products/laptops-and-netbooks/thinkpad-x-series-laptops/thinkpad-x240s/20aj/downloads/ds038194 After the update: ME Code Firmware Version: 9.5.60.1952 ME NFTP Firmware Version: 9.5.60.1952 ME FITC Firmware Version: 9.5.15.1730 Will check if this had finally fixed suspend/resume unreliability and report back here any findings. I really hope it did as it is a huge struggle to use ultrabook without hibernate functionality.