Bug 199025 - Suspend hangs with Nouveau loaded. Never fully suspends and impossible to return to running state. - Asus PRIME Z270-P
Summary: Suspend hangs with Nouveau loaded. Never fully suspends and impossible to ret...
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(Other) (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-06 03:19 UTC by todd
Modified: 2018-11-09 02:51 UTC (History)
8 users (show)

See Also:
Kernel Version: 4.15.x and 4.16.x
Subsystem:
Regression: No
Bisected commit-id:


Attachments
This is a text file with output from os-release, uname, lshw, lsmod, dmesg, etc. (237.99 KB, text/plain)
2018-03-06 03:19 UTC, todd
Details
hardware and driver info from nvidia driver / nouveau blacklist (262.77 KB, text/plain)
2018-03-16 04:19 UTC, todd
Details
result command $dmesg | grep -i acpi (9.25 KB, text/plain)
2018-03-18 11:39 UTC, Raphael Fontoura
Details
uname -r && lsmod && dnf info xorg (7.40 KB, text/plain)
2018-03-24 14:25 UTC, Raphael Fontoura
Details
dnf info xorg-x11-drv-nouveau and intel (1.42 KB, application/octet-stream)
2018-03-24 14:31 UTC, Raphael Fontoura
Details

Description todd 2018-03-06 03:19:01 UTC
Created attachment 274577 [details]
This is a text file with output from os-release, uname, lshw, lsmod, dmesg, etc.

Description of problem:
System hangs when going into suspend mode. The monitor goes into sleep mode but the PC never fully goes to sleep/suspend. The CPU and case fans continue to run, as well as the HDD continues spinning, plus the power light stays on. While in this hung state you cannot ctl_alt_del. You cannot ctl_alt_F(key) to toggle between tty's. You cannot re-awake the monitor under any circumstance. You cannot wake the system up by pressing the power button. You cannot SSH into the PC, even though I can when the machine is running. When you press keys on the keyboard, the key backlights light up, but that's it. Nothing else happens.

At this point the machine becomes completely unresponsive to anything but holding the power button down for 5 seconds or pressing the reset button. 


The above issue happens under the default graphical startup mode
(systemctl get-default)
graphical.target


This problem started in kernel 4.15.X
While using kernel 4.14.16 everything works perfectly.


How reproducible:
Completely 100% reproducible every time when starting in graphical mode.

Steps to Reproduce:
1. Let the machine automatically go into suspend mode on its own.
 OR
2. Use the GUI to select logout and suspend.
 OR
3. Type systemctl suspend into a terminal.

All 3 cause the exact same issue.


Expected results:
The machine is normally suppose to essentially turn off. The power light should start flashing and the fans and HDD should stop spinning. And most importantly, when you press the ESC key the machine and monitor should come back up. 


Additional info:
If you change the login mode (runlevel) from graphical to text using the command...

   systemctl set-default multi-user.target

...to start the machine in text mode and follow that by logging in and then issuing the command...

   startx

...then suspend works perfect. The machine suspends normally, the fans and HDD will stop spinning, the power light starts flashing on and off. Just like normal.
And most importantly, the machine also turns back on and the monitor also turns back on normally.


***The attached file is a text document with as much machine info I could think to provide up front***
Comment 1 Peter Forsberg 2018-03-09 21:19:31 UTC
Hi,
I have the exact same problem. Please let me know what information I can provide to help analyze the problem. 
// Peter
Comment 2 todd 2018-03-10 05:14:47 UTC
What graphics card are you using and what is the driver module you are using? 
Nvidia / Nouveau ?  AMD's Radeon maybe?
Comment 3 Peter Forsberg 2018-03-12 07:01:46 UTC
I have an Nvidia Quadro M1000M and I use Nouveau. The computer is a Lenovo p50.
// Peter
Comment 4 todd 2018-03-16 04:19:42 UTC
Created attachment 274771 [details]
hardware and driver info from nvidia driver / nouveau blacklist
Comment 5 todd 2018-03-16 04:25:44 UTC
check for the following file please.


what is output with root access?
#dnf info xorg-x11-drv-nouveau.x86_64

Mine is:
Version      : 1.0.15
Release      : 3.fc27
Source       : xorg-x11-drv-nouveau-1.0.15-3.fc27.src.rpm


I need to figure out what to do and who to contact next, but for now I can say that booting in to runlevel 3(multi-user.target), or blacklisting the nouveau driver and downloading and installing Nvidia's own driver directly from Nvidia causes this suspend hang problem to vanish. But this is a bit of a hassle with secure boot systems.

I don't suspect I will keep this system setup with Nvidia's driver because of having to constantly resign modules every time there is a kernel update. Sure I can use dnf -x kernel* to block the updates but then something like meltdown/spectre happens and forces a kernel update so I just gave up and leave nouveau have the HW, because I only game on windows anyway. So for now I'm opting for runlevel 3.
If you want to install Nvidia's drivers you might want to image your whole HDD so in case you have a severe problem, or you get tired of dealing with resigning the modules, your life will be easier.

To change runlevel check the additional notes part of my first post.

The main guide to manually install the nvidia driver. (See the following three URL's before proceeding to step 2.8.2 of the guide if your are running a Secure Boot system).
 > https://www.if-not-true-then-false.com/2015/fedora-nvidia-guide/ <


If you do not want to turn off Secure Boot then look into the following information before you proceed to step 2.8.2 of the guide list above.

go to fedoradocs here for info on creating your keys. 
But keep in mind that there is a small error in part of it. The error is talked about here (https://bugzilla.redhat.com/show_bug.cgi?id=1509714). Use <keyctl list %:.builtin_trusted_keys>   <keyctl list %:.secondary_trusted_keys instead>
>
> https://docs.fedoraproject.org/f27/system-administrators-guide/kernel-module-driver-configuration/Working_with_Kernel_Modules.html#sect-signing-kernel-modules-for-secure-boot
> <


Go to section (Signing the NVIDIA Kernel Module) at the following URL for automatically signing your nvidia modules.
(Go to this section specifically for step 2.8.2 of guide above).
>
> http://us.download.nvidia.com/XFree86/Linux-x86_64/384.90/README/installdriver.html
> <
Comment 6 todd 2018-03-16 04:36:19 UTC
p.s.
Check the file /etc/inittab for full information on changing your runlevel back and forth.
Comment 7 Peter Forsberg 2018-03-17 00:14:30 UTC
Hi Todd,
thanks for all the info.

Here is my output:
[se26975@peterfp50 ~]$ sudo dnf info xorg-x11-drv-nouveau.x86_64
[sudo] password for se26975: 
Last metadata expiration check: 0:45:52 ago on Sat 17 Mar 2018 12:17:50 AM CET.
Installed Packages
Name         : xorg-x11-drv-nouveau
Epoch        : 1
Version      : 1.0.15
Release      : 3.fc27
Arch         : x86_64
Size         : 215 k
Source       : xorg-x11-drv-nouveau-1.0.15-3.fc27.src.rpm
Repo         : @System
From repo    : fedora
Summary      : Xorg X11 nouveau video driver for NVIDIA graphics chipsets
URL          : http://www.x.org
License      : MIT
Description  : X.Org X11 nouveau video driver.

In fact I had been planning to install the Nvidia driver anyway, since I plan to run some CUDA applications, so I did that using the Negativo17 repository:
https://negativo17.org/nvidia-driver/

My initial install (no CUDA yet) after adding the repo:
sudo dnf install nvidia-driver dkms-nvidia nvidia-settings kernel-devel

After this install, suspend works flawlessly with the latest available Fedora 27 kernel (4.15.8). I don't use Secure Boot, so signing drivers is not an issue for me.

I'll keep monitoring this bug and can help out with providing information from my system if requested for troubleshooting. 

// Peter
Comment 8 todd 2018-03-17 04:12:42 UTC
Right, so that Nouveau driver source you listed looks like it has not been updated since the release of Fedora 27. What I could find shows that driver as the only one listed, unless you go back to a previous version of Fedora.

That kind of makes it look to me a little like something in the kernel has changed to cause this problem. Because apparently there has been no change in the video driver. Yet the problem resides with the video driver.

Fishy.

Glad you got a good work a round for ya. Might be a while before we get a fix.

On the other side. It may be in the near future that a fix just lands in our laps. The changelog for kernel 4.15.10, mentions the word "suspend" 39 times. 
kernel.org for that file in you like.
Comment 9 Raphael Fontoura 2018-03-18 11:39:42 UTC
Created attachment 274801 [details]
result command $dmesg | grep -i acpi
Comment 10 Raphael Fontoura 2018-03-18 11:48:06 UTC
I have the same problem on the laptop HP with IntelGraphics and Fedora 27 with x86_64.

This problem do not happen on kernel 4.14.

I push power button, or close laptop screen, or order the system to suspend on the graphical target and command echo mem > sys/power/state.

The system go to suspend mode, or hibernate mode, but the power button do not blink. When I push the button power for return the system, it does not happen. I need push the button for 10 seconds to power down the pc.

It is happening on kernel 4.15.x (Tests on 4.15.6, 4.15.7 anda 4.15.9).
Comment 11 todd 2018-03-22 21:46:26 UTC
just updated kernel to 4.15.10 and there is no change in the problem.

@ Raphael.. Do you know if you're using the nouveau driver? If so would you list the output of 
#dnf info xorg-x11-drv-nouveau

Thanks
Comment 12 Raphael Fontoura 2018-03-24 14:24:06 UTC
(In reply to todd from comment #11)
> just updated kernel to 4.15.10 and there is no change in the problem.
> 
> @ Raphael.. Do you know if you're using the nouveau driver? If so would you
> list the output of 
> #dnf info xorg-x11-drv-nouveau
> 
> Thanks
@ todd

Well. I'm using intel driver. But I execute this command (dnf info xorg-x11-drv-nouveau) and other commands. The results are attached.

Thanks for your return.
Comment 13 Raphael Fontoura 2018-03-24 14:25:29 UTC
Created attachment 274901 [details]
uname -r && lsmod && dnf info xorg
Comment 14 Raphael Fontoura 2018-03-24 14:31:06 UTC
Created attachment 274903 [details]
dnf info xorg-x11-drv-nouveau and intel
Comment 15 todd 2018-03-25 04:31:54 UTC
(In reply to Raphael Fontoura from comment #14)
> Created attachment 274903 [details]
> dnf info xorg-x11-drv-nouveau and intel

It looks like your video driver is not nouveau. Looks like xorg-x11-drv-intel-2.99.917-31.20171025.fc27.src.rpm.

Anyway, thanks for heads up. The more we know the better.
Have you tried to set the runlevel to 3? It comepletely fixes the problem for me but you have a text based login. Otherwise you'll probably have to see if you can find a different driver or use the old kernel. 

My first post talks about the runlevel and you can get more info from the file /etc/inittab
Comment 16 todd 2018-04-03 20:41:48 UTC
Also realized that to work around the bug while leaving the run level at default. You can just restart the X window server once logged in.
For example with Fedora xfce spin I use

systemctl restart lightdm.service

That takes care of the problem until reboot so if you want you could write a script.
Comment 17 Alex Tucker 2018-04-28 14:50:10 UTC
I appear to have the same problem, where suspend was working up to kernel 4.14.18 and hangs going into suspend on any newer kernels, 4.15.2 up, running on Fedora 27.

In my case, I have an NVIDIA Quadro FX 880M in my DELL M4500 laptop and am using the nouveau drivers.

Following the above workaround using the proprietary NVIDIA drivers I am able to successfully suspend and resume on kernel 4.16.3-200.fc27.x86_64.

I have attempted a git bisect between kernel versions 4.14.18 and 4.15.2 and reached the following:

d7722134b8254bcee6086230723814cddf9ab54b is the first bad commit
commit d7722134b8254bcee6086230723814cddf9ab54b
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Wed Nov 1 03:56:20 2017 +1000

    drm/nouveau: switch over to new memory and vmm interfaces
    
    Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

:040000 040000 8742552205a09192819ad38c2a47c3b9fe50ff81 b17bee448a21dd6e6273a1bb45fee939c5348ff5 M      drivers
Comment 18 Chen Yu 2018-05-04 08:34:50 UTC
Thanks for your bisect result , Alex,

Todd, Raphael, do you have a chance to verify if reverting the commit would help?
(might need recompiling the kernel), or you can simply blacklist all the graphic driver to confirm if this issue is in the graphic driver(but might get a black screen after resumed).
Comment 19 Patrice 2018-05-05 09:09:18 UTC
Hello,

I am having the exact same issue on an intel Q6600 , under fedora 28 , 

kernel : 4.16.5-300.fc28.x86_64

In a fresh Fedora 28 install, ( which has runlevel 5 as default ) suspend blanks the screen and disables any input, forcing to a hard restart using the power button. 
Reproductible : always

I am using nouveau : 

Name         : xorg-x11-drv-nouveau
Epoch        : 1
Version      : 1.0.15
Release      : 4.fc28
Arch         : x86_64
Size         : 229 k
Source       : xorg-x11-drv-nouveau-1.0.15-4.fc28.src.rpm
Repo         : @System
From repo    : anaconda
Summary      : Xorg X11 nouveau video driver for NVIDIA graphics chipsets
URL          : http://www.x.org
License      : MIT
Description  : X.Org X11 nouveau video driver.

Setting runlevel defaults to 3 and launching x with startx fixes the issue, which is perfectly acceptable for me as a workaround. Thank you very much @todd for the hint.

I'd be glad to provide additional info if it can help.
Comment 20 Patrice 2018-05-05 09:56:56 UTC
the video card is a GeForce GTS 250
Comment 21 Paul van Maaren 2018-05-06 17:24:08 UTC
I have the same problem on an Acer E13 (ES1-311-C4Q6), suspend was working as ex
pected on 4.14.18 and hangs on all newer kernels I have tried. Was running Fedor
a 27, just upgraded to 28 and no change.

Working driver:

# modinfo -F filename `lshw -c video | awk '/configuration: driver/{print $2}' | cut -d= -f2`
/lib/modules/4.14.18-300.fc27.x86_64/kernel/drivers/gpu/drm/i915/i915.ko.xz
Comment 22 todd 2018-05-08 05:17:58 UTC
@Chen,

I'm going to be tied up for a few weeks, due to a larger than normal workload. So I won't have time to mess with this bug right now.
In the mean time I will try blacklisting all vid drivers and see what happens. Although I suspect it will act the same as when the X display manager is disabled by setting multi-user.target.
But I know it won't be any time this week. Maybe mid next week if I'm lucky.


For the time being though, there are 3 work-arounds listed here that fix the problem.

1) Change runlevel from 5 to 3 (permanent fix - see opening post + comment #6).

2) Restart your X display manager once you log in. For instance Xfce spins use Lightdm.  (Temporary fix that lasts until the system is rebooted).
 * See comemnt #16

3) Install another video driver and blacklist Nouveau (permanent fix but a big hassle if you run secure boot).
 * see comment #5 for manual Nvidia driver install or see comment #7 for external repository to install Nvidia driver.

I personally don't know anything about the repo listed in comment #7. Whereas in comment #5, you are downloading directly from Nvidia themselves.
Comment 23 todd 2018-05-08 05:28:00 UTC
I'm also curious what desktop environment you guys are running?

I.E. 
Awesome, Cinnamon, Enlightenment, GNOME, KDE Plasma, LXDE, MATE, Openbox, Ratpoison, Xfce, ETC.

Thank you.
Comment 24 Paul van Maaren 2018-05-08 06:31:40 UTC
My laptop does not have Nvidia, and I still have the same issue. I am using F28 and KDE Plasma.
Comment 25 todd 2018-05-08 07:10:02 UTC
I wish they let us edit our posts here. 

I realized I made a mistake in number 3 of comment #22.

It should say "your systems default video driver" instead of Nouvaeu. Also if you're using a GPU other than Nvidia than you will obviously need to find a Intel, or AMD driver you can attempt to install.

Sorry for the confusion. 


@paul. 
Thanks for info.
Comment 26 Chen Yu 2018-05-08 07:12:19 UTC
If this is highly connected with video driver, then bugzilla freedesktop might be a better place to track this bug down since I've seldom seen any graphic driver people  jumped into kernel bugzilla to track issues, and they prefer using freedesktop(at least for i915) AFAIK.
Comment 27 todd 2018-05-08 08:36:19 UTC
@Chen

After seeing a few responses from people I'm starting to see that this is definitely not a video driver specific issue. Not everybody is using Nouveau. I suspected it wasn't from the start because the default generic video driver for Nvidia (Nouveau) has not changed from the start of fedora 27 to the end of fedora 27, and the problem didn't start until an update somewhere in the middle.

So Nouveau was never updated, yet the problem started with a newer kernel. But that could be a coincidence. So I just checked the update history on this machine and found that on 2018-02-17 there was a kernel update to 4.15.3 and there was also another important update that day which could very well be the problem.

xorg-x11-server-Xorg
and 
xorg-x11-server-common

I had been out of town for a while and thus missed all updates from 4.14.16 to 4.15.3 So any packages between then I don't have a listing for.

With that in mind I remember that the problem does not exist if you restart X or if you boot into text mode and then manually startX. Plus because Paul just stated that he is using KDE than I guess that pretty much eliminates the desktop environment because I use Xfce.

So I think you're right. We need to report this over at freedesktop bugzilla as well. 

Great, more work :(
I already started a post at bugzilla fedora, they said its likely kernel related so my post was closed with a note to post at bugzilla kernel. Now I gotta start a 3rd one. 
If I had lots of time I wouldn't mind. Perhaps we can con you into starting one over there. If you do, please post link to it. Otherwise When I get more time I will.
Comment 28 Paul van Maaren 2018-05-08 09:42:16 UTC
I will change the runlevel to text and see if it resolves the problem. I assume I can test by typing: "echo mem > /sys/power/state" ?
Comment 29 Paul van Maaren 2018-05-08 18:58:17 UTC
I just did the runlevel test and it does not make difference for me. I did:

ln -sf  /usr/lib/systemd/system/multi-user.target /etc/systemd/system/default.target

and a reboot to 4.16.6-302 and ran:

echo mem > /sys/power/state

The power light stayed on and it does not wake up. I tried the same after booting to 4.14.18-300 that worked fine.
Comment 30 Patrice 2018-05-08 19:51:35 UTC
I can't access the faulty machine until next Saturday - but it runs Fedora28's latest Xfce .
Comment 31 todd 2018-05-09 00:06:42 UTC
@Paul

Use the correct method to set your run levels.

The original post tells you how and comment #6 even tells you where to get the information.

read the file....   /etc/inittab   

The correct way is .....

  systemctl set-default multi-user.target  (for text mode / runlevel 3)

  systemctl set-defaulta graphical.target (for graphical mode / runlevel 5)

Read the file /etc/inittab please.
Comment 32 todd 2018-05-09 00:10:57 UTC
@Paul

p.s. Almost forgot to mention.  Again, please use the systemctl method. So to test your systems suspend use.

systemctl suspend
Comment 33 Paul van Maaren 2018-05-09 04:28:43 UTC
OK, tried that. Same result...
Comment 34 todd 2018-05-10 04:08:16 UTC
Did you delete all your symlinks?

Also, it sounds like you might just have done a "dnf system-upgrade reboot" rather than a clean install.?.? Fedora's system-upgrade is a total and complete P.o.S., most often causing more problems then it's worth. 

I have an idea of some things I need to test but I don't have time ATM to perform them. I'll post my results when I can get away from work long enough to run them. Could be a couple weeks.

In the mean time, all I can say that might possibly help you is that I really think you need to at least try a clean install. Make sure to zero your drive first. 
Perhaps you already know, but after a file is deleted it remains on the drive until at some point the physical space the file was using becomes overwritten by a new file. Plus when simply formatting a drive, it does not delete the old files on the drive. Even after you install a new OS the old files can still be recovered, so long as the physical drive space the original file was residing on is not overwritten.


If you care to run one of the tests I plan to run but don't have time right now then feel free.


**BE SURE TO IMAGE YOUR DRIVE FIRST IN CASE THIS HOSES YOUR INSTALLED OS's.***
Or don't try it at all if you don't back up all your shit first and expect this to work perfectly for you. This is an experiment!!


Use....

dnf --showduplicates list xorg-x11-server-Xorg

and 

dnf --showduplicates list xorg-x11-server-common

Use those to enable you to downgrade the X11 server packages to one that was being used when this problem did not occur.
Specifically version 1.19.5-1.fc27.


**BE SURE TO IMAGE YOUR DRIVE FIRST IN CASE THIS HOSES YOUR INSTALLED OS's.***
Or don't try it at all if you don't back up all your shit first and expect this to work perfectly for you. This is an experiment!!

Either way, good luck.
Comment 35 todd 2018-05-10 04:27:39 UTC
p.s.
As you can see, that version listed above is for fedora 27, not 28 so you will need to re-install fedora 27 first.
Comment 36 Samuel Sieb 2018-06-12 19:39:42 UTC
This is resolved for me with the 4.16 kernel after upgrading to F28.  And the Fedora bug has been closed as well with someone else saying the same thing.
Comment 37 Alex Tucker 2018-06-14 12:46:19 UTC
Thanks Samuel, updating to kernel-4.16.14-300.fc28.x86_64 and moving back to Nouveau rather than the workaround with the NVIDIA driver, suspend and resume now works again for me too.

Would you be able to provide a link to the Fedora bug you mentioned?
Comment 39 Chen Yu 2018-11-09 02:51:49 UTC
@todd, is there any chance that you can test if the fc28 works for you?

Note You need to log in before you can comment on or make changes to this bug.