Bug 82711 - After update to kernel soft lockup (oops) and incomplete boot and shutdown fail
Summary: After update to kernel soft lockup (oops) and incomplete boot and shutdown fail
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-18 09:51 UTC by Mike Cloaked
Modified: 2016-06-05 03:26 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.16.1-1-ARCH
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Systemd journal log after rebooting from crashed system (300.03 KB, text/x-log)
2014-08-18 09:51 UTC, Mike Cloaked
Details
Journal log after cleaning out systemd journal and reboot (123.19 KB, text/plain)
2014-08-18 10:26 UTC, Mike Cloaked
Details
dmesg after selecting reboot from kdm (67.89 KB, text/plain)
2014-08-18 18:59 UTC, Mike Cloaked
Details
Systemd journal after failed reboot from kdm greeter (124.96 KB, text/plain)
2014-08-18 19:01 UTC, Mike Cloaked
Details
dmesg log after incomplete boot as per comment #6 (124.50 KB, text/x-log)
2014-08-18 19:30 UTC, Mike Cloaked
Details
systemd journal log after incomplete boot as per comment #6 (338.19 KB, text/plain)
2014-08-18 19:31 UTC, Mike Cloaked
Details
systemd journal showing kernel oops with kernel 3.16.4 (135.12 KB, text/plain)
2014-10-07 09:47 UTC, Mike Cloaked
Details

Description Mike Cloaked 2014-08-18 09:51:54 UTC
Created attachment 147021 [details]
Systemd journal log after rebooting from crashed system

After update to the kernel 3.16.1-1-ARCH the Lenovo Y510p with hybrid Intel/Nvidia graphics fails to boot though occasionally boot does complete and when it did then partial logs were available from the systemd journal despite having to pull the power to shutdown the system which corrupts the journal files.

CPU soft lockup is one part of the log, and it seems that the graphics card fails to initialise properly also.

Aug 18 10:10:14 lenovo2 kernel: nouveau E[  PGRAPH][0000:01:00.0] init failed, -16
Aug 18 10:10:14 lenovo2 kernel: nouveau E[  PGRAPH][0000:01:00.0][0x0300e417][ffff880263bcfc00] engine failed, -16
Aug 18 10:10:14 lenovo2 kernel: nouveau E[  PGRAPH][0000:01:00.0][0x0000a097][ffff880261d45700] parent failed, -16
Aug 18 10:10:14 lenovo2 kernel: nouveau E[Xorg.bin[439]] 0xdddddddd:0xcccc0000 init failed with -16
Aug 18 10:10:14 lenovo2 kernel: nouveau E[Xorg.bin[439]] 0xffffffff:0xdddddddd init failed with -16
Aug 18 10:10:14 lenovo2 kernel: nouveau E[Xorg.bin[439]] 0xffffffff:0xffffffff init failed with -16

Graphics cards are:
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 750M] (rev a1)

I was not sure which components to select for this report so chose non Intel DRI - if necessary that can be changed.

Either way when the bug hits during boot the VTs fill with log output but it is not clear that the logs contain all of the diagnostic information but the attached log file is all that I was able to capture.
Comment 1 Mike Cloaked 2014-08-18 10:26:34 UTC
Created attachment 147031 [details]
Journal log after cleaning out systemd journal and reboot
Comment 2 Mike Cloaked 2014-08-18 18:58:03 UTC
After a normal looking bootup and working with a normal kde session, I logged out of kde, and used the kdm greeter screen to request the system to reboot. It seemed to exit from X but left the system hanging with a VT visible but it had not rebooted.

At this point I captured the dmesg and journal files which I will now attach.
Comment 3 Mike Cloaked 2014-08-18 18:59:48 UTC
Created attachment 147111 [details]
dmesg after selecting reboot from kdm

Selecting reboot from kdm greeter the system failed to reboot but left a working VT from which a root login was possible.
Comment 4 Mike Cloaked 2014-08-18 19:01:08 UTC
Created attachment 147121 [details]
Systemd journal after failed reboot from kdm greeter

After logging out from kde and selecting reboot, the system exited X but hung with a VT screen. This is the journal log after logging in as root to the VT at that point.
Comment 5 Mike Cloaked 2014-08-18 19:08:10 UTC
After capturing the log files in comments #3 and #4 the system was commanded to power off using "systemctl poweroff".  The shutdown sequence started and was interrupted by a countdown timer displaying on the VT:

A stop job is running for K Display Manager (xxx xxx / 1min 30s)

Once the 90 seconds elapsed the system did complete the shutdown and poweroff.
Comment 6 Mike Cloaked 2014-08-18 19:29:44 UTC
After the system was powered off as in comment #5 the laptop was booted again - to the kdm greeter - but changing to a text VT only and attempting to log in as root the screen became filled with log messages, and I attempted to capture the dmesg and journal log files as the system slowly began to slow to responses from the keyboard, and the fans increased speed to high level. I was able to capture the log files before the system became completely unresponsive and it was necessary to hold the power button to hard poweroff.  The log files were then rsynced to a different computer and I will attach these two logs next.
Comment 7 Mike Cloaked 2014-08-18 19:30:58 UTC
Created attachment 147131 [details]
dmesg log after incomplete boot as per comment #6
Comment 8 Mike Cloaked 2014-08-18 19:31:43 UTC
Created attachment 147141 [details]
systemd journal log after incomplete boot as per comment #6
Comment 9 abandoned account 2014-09-05 16:58:17 UTC
This may or may not apply but maybe you could check if your video driver (nouveau?) is (re)compiled for the new kernel? I got a similar NULL pointer deref. when I recompiled the kernel and my fglrx(not nouveau for me) driver failed to compile and thus kept the previous module.
Comment 10 Mike Cloaked 2014-09-05 19:42:23 UTC
The nouveau-dri package in my system was actually updated after I sent this report (actually three days after my last comment), and therefore presumably compiled for the new kernel.

[2014-08-21 20:40] [PACMAN] upgraded nouveau-dri (10.2.5-1 -> 10.2.6-1)

The nouveau package was updated a little earlier, and prior to the date of the original bug report.

[2014-07-29 12:37] [PACMAN] upgraded xf86-video-nouveau (1.0.10-2 -> 1.0.10-3)

However since the nouveau-dri package was updated I have not had a repeat of the incomplete boot problem. I was continuing to test and wait to check if the problem did recur. I will test further and report back again if there is any recurrence.
Comment 11 Mike Cloaked 2014-10-07 09:47:03 UTC
Created attachment 152771 [details]
systemd journal showing kernel oops with kernel 3.16.4

With the latest kernel linux 3.16.4-1 and the packages as follows in arch linux:
xf86-video-nouveau 1.0.11-2
xf86-video-intel 2.99.916-3
mesa-dri 10.3.0-3

I appear to boot to the KDE greeter and login without problems but shurtdown from KDE gives a minute or two delay before the screen shows the kernel oops and requests a reboot is needed. I captured the systemd journal log at this point before trying to reboot after logging back in as root and using systemctl to reboot.  Reboot or shutdown then gives a 1min 30sec delay due to a "stop job" running before the system will shutdown or reboot.
Comment 12 Mike Cloaked 2014-10-09 16:53:16 UTC
The only way I can get the laptop to behave sensibly on shutdown is to blacklist the nouveau module at boot using modprobe.blacklist=nouveau on the kernel line, or to add to the file /etc/modprobe.d/blacklist.conf a single line with "install nouveau /bin/false".

Presumably this indicates there is a bug in the nouveau module for my graphics card and if there is a way to get some diagnostics to help pin this down so that a code fix can be found then please let me know what to do to generate suitable log files?

The graphics cards on this machine are:
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 750M] (rev a1)
Comment 13 Christopher Crouse 2014-10-22 11:05:12 UTC
Possible related to Bug #70354 and/or #85791?

Bug #85791
https://bugzilla.kernel.org/show_bug.cgi?id=85791

Bug #70354
https://bugs.freedesktop.org/show_bug.cgi?id=70354


This is bug is affecting me too, however other option is to only use the integrated (Intel) card & disable the Nvidia card in BIOS. Currently works for me.

Hope it helps!

Note You need to log in before you can comment on or make changes to this bug.