Bug 201867 - Nouveau + discrete GPU (GP104M = GTX 1070 M) Driver Crashes, System freezes, dual screen not working
Summary: Nouveau + discrete GPU (GP104M = GTX 1070 M) Driver Crashes, System freezes, ...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-04 00:29 UTC by david.kremer.dk
Modified: 2019-03-19 14:52 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.19.4-arch1-1-ARCH #1 SMP PREEMPT
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Driver Traceback (6.50 KB, text/plain)
2018-12-04 00:29 UTC, david.kremer.dk
Details

Description david.kremer.dk 2018-12-04 00:29:24 UTC
Created attachment 279831 [details]
Driver Traceback

The problem is proteiform. It is actually well described in https://bugzilla.kernel.org/show_bug.cgi?id=156341 so this could be a duplicate, however, since the driver supposed to actually drive the nvidia GPU is nouveau, I feel obliged to fill a new bug report to address the issue.

So basically, there is two issues:

1. when loading the nouveau driver in a tty, you can't launch Xorg afterward, the xinit program waits for a server to be available but that never come.
2. When loading the nouveau driver after the X session has started, the fan finally switches off, but there is first a traceback in dmesg and everything that is reported in bug_id=156341 happens: subsequent calls to lspci/lshw/... are freezing the system definetely without possibility to recover except by doing a hard reboot.

So there's definitely something messy about this familiy of Nvidia GPU integrated as a discrete GC on this new laptop generation.

Can't give much more info

See attachment for the traceback.
 
lspci output:

00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)
01:00.0 VGA compatible controller: NVIDIA Corporation GP104M [GeForce GTX 1070 Mobile] (rev ff)

CPU=Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Comment 1 david.kremer.dk 2018-12-04 15:46:34 UTC
I must add that the symptom as well as the concerned hardware starts to be pretty well documented.

The problem arises with 

- recent nvidia mobile cards
- optimus technology built in
- intel integrated GPU (what else) ?

You can arrive to the same result using the `bbswitch` module or the `acpi_call` module, trying to switch off the graphic card, *BUT*, since it's really the `nouveau` driver job to do the work, and since it is already trying to do it, I see no point in reporting bug for those modules.

See also:

- https://bugzilla.kernel.org/show_bug.cgi?id=156341
- https://github.com/Bumblebee-Project/Bumblebee/issues/1007
- https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-234494238

for consistent ways of reproducing the undesired behaviour.

The cheap hack to provide specific options for the `acpi_osi` driver should be discouraged as it is not consistent across GPU models and Laptop models.
Comment 2 Andrey Melentyev 2019-01-08 10:19:01 UTC
Possibly a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=200939 - although that tickets lacks good description

Note You need to log in before you can comment on or make changes to this bug.