Bug 176961

Summary: resume from suspend not working with nvidia980 Ti, drivers 352 - 370, kernels 3.16 - 4.4
Product: Drivers Reporter: emailjonathananderson-fedora
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEEDINFO ---    
Severity: normal CC: emailjonathananderson-fedora, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://devtalk.nvidia.com/default/topic/919984/linux/resume-from-suspend-not-working-with-980-ti-drivers-352-370-kernels-3-16-4-4/
Kernel Version: 3.16 - 4.4 Subsystem:
Regression: No Bisected commit-id:

Description emailjonathananderson-fedora 2016-10-07 14:16:03 UTC
I have a system built on Asus z170m-plus with an EVGA 980 ti hybrid.
The system boots without any (to me visible) problems.
It was built with the purpose of GPU computing, and does not show any problems under load.
It crashes maybe 50% of the times when resuming from suspend.
When crashing on resume, the screen turns on but stays black.
ssh into the machine often, but not always, work, and shows an Xorg process at 100% cpu.

I have run both Linux Mint 17.3 - 18 and Fedora on it with kernels from 3.x up to 4.4 and cycled through Nvidia drivers 350.xx to 370 and still see this problem.
I can't try nouveau as the card is not supported there yet.

If I don't let the machine suspend, I have not seen this problem.

Discussions on the nvidia board indicate that many new nvidia cards of families 9xx and 10xx might suffer from this but the interest from nvidia is small.
Comment 1 emailjonathananderson-fedora 2016-10-07 14:16:32 UTC
Latest test with kernel 4.4 and driver 370.28
Display is black in resume from suspend. System is so responsive that it allows for a remote login with ssh so I can get to diagnostics.

Xorg is hanging with 100% cpu.

lsmod shows that nouveau is NOT loaded.

Excerpts from the logs that can be interesting are:
[ 138.404188] nvidia-modeset: WARNING: GPU:0: Lost display notification; continuing.
[ 140.406359] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 140.406420] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[ 181.619021] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device Samsung S24D390 (HDMI-0)
[ 389.665921] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:0x00000040
Comment 3 Zhang Rui 2016-10-10 05:55:37 UTC
This sounds like a graphics driver issue to me.
does the problem still exists in text mode, without the graphics driver loaded?
Comment 4 emailjonathananderson-fedora 2016-10-25 12:06:03 UTC
Good question. 
Can you please direct me how to test this?
Comment 5 emailjonathananderson-fedora 2016-11-10 18:01:46 UTC
(In reply to Zhang Rui from comment #3)
> This sounds like a graphics driver issue to me.
> does the problem still exists in text mode, without the graphics driver
> loaded?

Zhang, can you please direct me how to test this in text mode?