Bug 209129

Summary: HW related error message under Gnome important tab
Product: Drivers Reporter: Laszlo (laszlo.a.toth)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: CLOSED INSUFFICIENT_DATA    
Severity: high CC: bp, imwellcushtymelike
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.4.0-42-generic Subsystem:
Regression: No Bisected commit-id:
Attachments: Gnome log file under important tab
lower level noveau log records
complete dmesg

Description Laszlo 2020-09-02 15:51:58 UTC
Created attachment 292297 [details]
Gnome log file under important tab

HI All,

Please find my important error messages under Gnome log manager.
If you need additional information don't hesitate to ask.
Best regards,
 Laci.
Comment 1 Borislav Petkov 2020-09-02 15:57:59 UTC
Which of those 6 lines do you mean exactly? And even with those messages, your machine works ok?
Comment 2 Laszlo 2020-09-02 16:06:22 UTC
Hi,
I mean the following lines in Gnome important tab.
13:25:01 kernel: nouveau 0000:01:00.0: bsp: init failed, -2
13:25:01 kernel: nouveau 0000:01:00.0: bsp: init failed, -2
13:25:01 kernel: nouveau 0000:01:00.0: vp: init failed, -2

The PC is really frozen for 1-3 minutes, not usable for web browsing, LibreOffice  document creation/read, opening a terminal, starting new app, . . .
Comment 3 Borislav Petkov 2020-09-02 16:36:54 UTC
That's nouveau saying that some accelerators? cannot be initialized or so. Does your PC recover after those 1-3 minutes?

Can you upload full dmesg?

Thx.
Comment 4 Laszlo 2020-09-02 17:34:46 UTC
Created attachment 292305 [details]
lower level noveau log records
Comment 5 Laszlo 2020-09-02 17:38:03 UTC
Created attachment 292307 [details]
complete dmesg
Comment 6 Borislav Petkov 2020-09-02 18:38:44 UTC
That GpuWatchdog process, where does it come from? I cannot find it in my linux distros and people are reporting similar issues with it and mentioning the proprietary nvidia drivers. Do you have them installed, per chance?

If so, remove them - you're using nouveau - and try to reproduce then.

HTH.
Comment 7 Laszlo 2020-09-03 06:50:39 UTC
I didn't install any additional driver for nvidia / GpuWatchdog. I think it came from automatic Ubuntu 18.04 install or a 3rd party.

You mean I should remove GpuWatchdog and let nouveau manage the Gpu ?
Comment 8 Borislav Petkov 2020-09-03 09:08:52 UTC
Let's first see where it comes from. Do this as root:

# dpkg -S $(which GpuWatchdog)
Comment 9 Laszlo 2020-09-03 09:21:28 UTC
Ok, I am going to run the requested command.

the 1rst GpuWatchdog record appeared when I started chrome browser.
(signed page segmentation fault)
this could be true since I am not familiar with SUID sandbox & the error message:

"Most likely you need to configure your SUID sandbox correctly"

11:12:46 Main: Most likely you need to configure your SUID sandbox correctly
11:12:46 Main: Most likely you need to configure your SUID sandbox correctly
11:12:46 Main: [0903/111246.159445:ERROR:nacl_helper_linux.cc(308)] NaCl helper process running without a sandbox!
11:12:46 Main: Opening in existing browser session.
Comment 10 Laszlo 2020-09-03 09:23:43 UTC
laci@sanyika:~$ sudo dpkg -S $(which GpuWatchdog)
[sudo] password for laci: 
dpkg-query: error: --search needs at least one file name pattern argument
Comment 11 Borislav Petkov 2020-09-03 09:49:07 UTC
Hmm, so it looks like that GpuWatchdog thing is chrome thread. And the error in your dmesg before it:

[  266.208179] [TTM] Buffer eviction failed
[  266.346883] GpuWatchdog[2984]: segfault at 0 ip

says that a buffer eviction in the TTM fails which probably causes the watchdog to segfault. In any case, this is a DRM issue, not a platform one. Reassigning...

What you could do for starters is try the latest kernel 5.8 to check whether this has been fixed in the meantime.

HTH.
Comment 12 Laszlo 2020-09-03 10:27:10 UTC
Ok, then what shall I do?
Comment 13 Borislav Petkov 2020-09-03 10:59:53 UTC
> Ok, then what shall I do?

Did you not read this?

"What you could do for starters is try the latest kernel 5.8 to check whether this has been fixed in the meantime."
Comment 14 Laszlo 2020-09-03 11:28:48 UTC
Then please assign a person at the mentioned new thread who can examine the reason of this bug, verify the root cause, assess whether they have fixed it or not.

Then he can suggest the appropriate kernel and probably driver . . .

I am user who needs some support during the correcting procedure . . . .
Comment 15 Ken Sharp 2023-12-08 08:00:56 UTC
I've opened a bug report at Launchpad for this but cannot say for certain there is a kernel bug here.

https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-6.2/+bug/2045951