Bug 209129 - HW related error message under Gnome important tab
Summary: HW related error message under Gnome important tab
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-02 15:51 UTC by Laszlo
Modified: 2023-12-08 08:00 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.4.0-42-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Gnome log file under important tab (481 bytes, text/plain)
2020-09-02 15:51 UTC, Laszlo
Details
lower level noveau log records (5.15 KB, text/plain)
2020-09-02 17:34 UTC, Laszlo
Details
complete dmesg (61.97 KB, text/plain)
2020-09-02 17:38 UTC, Laszlo
Details

Description Laszlo 2020-09-02 15:51:58 UTC
Created attachment 292297 [details]
Gnome log file under important tab

HI All,

Please find my important error messages under Gnome log manager.
If you need additional information don't hesitate to ask.
Best regards,
 Laci.
Comment 1 Borislav Petkov 2020-09-02 15:57:59 UTC
Which of those 6 lines do you mean exactly? And even with those messages, your machine works ok?
Comment 2 Laszlo 2020-09-02 16:06:22 UTC
Hi,
I mean the following lines in Gnome important tab.
13:25:01 kernel: nouveau 0000:01:00.0: bsp: init failed, -2
13:25:01 kernel: nouveau 0000:01:00.0: bsp: init failed, -2
13:25:01 kernel: nouveau 0000:01:00.0: vp: init failed, -2

The PC is really frozen for 1-3 minutes, not usable for web browsing, LibreOffice  document creation/read, opening a terminal, starting new app, . . .
Comment 3 Borislav Petkov 2020-09-02 16:36:54 UTC
That's nouveau saying that some accelerators? cannot be initialized or so. Does your PC recover after those 1-3 minutes?

Can you upload full dmesg?

Thx.
Comment 4 Laszlo 2020-09-02 17:34:46 UTC
Created attachment 292305 [details]
lower level noveau log records
Comment 5 Laszlo 2020-09-02 17:38:03 UTC
Created attachment 292307 [details]
complete dmesg
Comment 6 Borislav Petkov 2020-09-02 18:38:44 UTC
That GpuWatchdog process, where does it come from? I cannot find it in my linux distros and people are reporting similar issues with it and mentioning the proprietary nvidia drivers. Do you have them installed, per chance?

If so, remove them - you're using nouveau - and try to reproduce then.

HTH.
Comment 7 Laszlo 2020-09-03 06:50:39 UTC
I didn't install any additional driver for nvidia / GpuWatchdog. I think it came from automatic Ubuntu 18.04 install or a 3rd party.

You mean I should remove GpuWatchdog and let nouveau manage the Gpu ?
Comment 8 Borislav Petkov 2020-09-03 09:08:52 UTC
Let's first see where it comes from. Do this as root:

# dpkg -S $(which GpuWatchdog)
Comment 9 Laszlo 2020-09-03 09:21:28 UTC
Ok, I am going to run the requested command.

the 1rst GpuWatchdog record appeared when I started chrome browser.
(signed page segmentation fault)
this could be true since I am not familiar with SUID sandbox & the error message:

"Most likely you need to configure your SUID sandbox correctly"

11:12:46 Main: Most likely you need to configure your SUID sandbox correctly
11:12:46 Main: Most likely you need to configure your SUID sandbox correctly
11:12:46 Main: [0903/111246.159445:ERROR:nacl_helper_linux.cc(308)] NaCl helper process running without a sandbox!
11:12:46 Main: Opening in existing browser session.
Comment 10 Laszlo 2020-09-03 09:23:43 UTC
laci@sanyika:~$ sudo dpkg -S $(which GpuWatchdog)
[sudo] password for laci: 
dpkg-query: error: --search needs at least one file name pattern argument
Comment 11 Borislav Petkov 2020-09-03 09:49:07 UTC
Hmm, so it looks like that GpuWatchdog thing is chrome thread. And the error in your dmesg before it:

[  266.208179] [TTM] Buffer eviction failed
[  266.346883] GpuWatchdog[2984]: segfault at 0 ip

says that a buffer eviction in the TTM fails which probably causes the watchdog to segfault. In any case, this is a DRM issue, not a platform one. Reassigning...

What you could do for starters is try the latest kernel 5.8 to check whether this has been fixed in the meantime.

HTH.
Comment 12 Laszlo 2020-09-03 10:27:10 UTC
Ok, then what shall I do?
Comment 13 Borislav Petkov 2020-09-03 10:59:53 UTC
> Ok, then what shall I do?

Did you not read this?

"What you could do for starters is try the latest kernel 5.8 to check whether this has been fixed in the meantime."
Comment 14 Laszlo 2020-09-03 11:28:48 UTC
Then please assign a person at the mentioned new thread who can examine the reason of this bug, verify the root cause, assess whether they have fixed it or not.

Then he can suggest the appropriate kernel and probably driver . . .

I am user who needs some support during the correcting procedure . . . .
Comment 15 Ken Sharp 2023-12-08 08:00:56 UTC
I've opened a bug report at Launchpad for this but cannot say for certain there is a kernel bug here.

https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-6.2/+bug/2045951

Note You need to log in before you can comment on or make changes to this bug.