Bug 201847

Summary: nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 000000000a721000 engine 00 [GR] client 0f [GPC0/PROP_0] reason 82 [] on channel 4 [00ff85c000 X[3819]]
Product: Drivers Reporter: Marc B. (kernel.org)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED ANSWERED    
Severity: normal CC: hcoin, interface, simonfogliato
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.19.6 Subsystem:
Regression: No Bisected commit-id:

Description Marc B. 2018-12-02 21:47:25 UTC
Dec  2 21:05:25 local kernel: [    0.955901] nouveau 0000:01:00.0: NVIDIA GM107 (117300a2)
Dec  2 21:05:25 local kernel: [    0.992024] nouveau 0000:01:00.0: bios: version 82.07.9d.00.14
Dec  2 21:05:25 local kernel: [    0.993477] nouveau 0000:01:00.0: fb: 4096 MiB GDDR5
Dec  2 21:05:25 local kernel: [    0.993527] nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 001228 [ IBUS ]
Dec  2 21:05:25 local kernel: [    1.008241] nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 10ac08 [ IBUS ]
Dec  2 21:05:25 local kernel: [    1.061536] nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB
Dec  2 21:05:25 local kernel: [    1.061539] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
Dec  2 21:05:25 local kernel: [    1.061543] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
Dec  2 21:05:25 local kernel: [    1.061546] nouveau 0000:01:00.0: DRM: DCB version 4.0
Dec  2 21:05:25 local kernel: [    1.061549] nouveau 0000:01:00.0: DRM: DCB outp 00: 04800fb6 04420010
Dec  2 21:05:25 local kernel: [    1.061552] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011fa6 04420010
Dec  2 21:05:25 local kernel: [    1.061555] nouveau 0000:01:00.0: DRM: DCB outp 02: 02011f62 00020010
Dec  2 21:05:25 local kernel: [    1.061558] nouveau 0000:01:00.0: DRM: DCB outp 03: 08022fc6 04420010
Dec  2 21:05:25 local kernel: [    1.061561] nouveau 0000:01:00.0: DRM: DCB outp 04: 08022f82 00020010
Dec  2 21:05:25 local kernel: [    1.061564] nouveau 0000:01:00.0: DRM: DCB outp 05: 01033fd6 04420020
Dec  2 21:05:25 local kernel: [    1.061567] nouveau 0000:01:00.0: DRM: DCB outp 06: 01033f92 00020020
Dec  2 21:05:25 local kernel: [    1.061570] nouveau 0000:01:00.0: DRM: DCB conn 00: 00002047
Dec  2 21:05:25 local kernel: [    1.061573] nouveau 0000:01:00.0: DRM: DCB conn 01: 00001146
Dec  2 21:05:25 local kernel: [    1.061575] nouveau 0000:01:00.0: DRM: DCB conn 02: 00010246
Dec  2 21:05:25 local kernel: [    1.061578] nouveau 0000:01:00.0: DRM: DCB conn 03: 00020346
Dec  2 21:05:25 local kernel: [    1.433020] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
Dec  2 21:05:25 local kernel: [    1.535562] nouveau 0000:01:00.0: DRM: allocated 1920x1080 fb: 0x80000, bo 0000000071889fdf
Dec  2 21:05:25 local kernel: [    1.853891] nouveau 0000:01:00.0: disp: 0x00006671[0]: INIT_GENERIC_CONDITON: unknown 0x07
Dec  2 21:05:25 local kernel: [    2.034030] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
Dec  2 21:05:25 local kernel: [    2.034061] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0



Dec  2 22:35:07 local kernel: [ 5422.645466] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff85c000 X[3819]]
Dec  2 22:35:07 local kernel: [ 5422.645475] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000000 [] warp 3c000d [OOR_REG]
Dec  2 22:35:07 local kernel: [ 5422.646304] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff85c000 X[3819]]
Dec  2 22:35:07 local kernel: [ 5422.646316] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 0, y = 0, format = 0, storage type = fe
Dec  2 22:35:07 local kernel: [ 5422.646334] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff85c000 X[3819]]
Dec  2 22:35:07 local kernel: [ 5422.646346] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 384, y = 74, format = 0, storage type = fe
Dec  2 22:35:07 local kernel: [ 5422.646362] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff85c000 X[3819]]
Dec  2 22:35:07 local kernel: [ 5422.646373] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 352, y = 152, format = 0, storage type = fe
Dec  2 22:35:07 local kernel: [ 5422.646388] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff85c000 X[3819]]
Dec  2 22:35:07 local kernel: [ 5422.646399] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 448, y = 268, format = 0, storage type = fe
Dec  2 22:35:07 local kernel: [ 5422.646418] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 000000000a721000 engine 00 [GR] client 0f [GPC0/PROP_0] reason 82 [] on channel 4 [00ff85c000 X[3819]]
Dec  2 22:35:07 local kernel: [ 5422.646425] nouveau 0000:01:00.0: fifo: channel 4: killed
Dec  2 22:35:07 local kernel: [ 5422.646427] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
Dec  2 22:35:07 local kernel: [ 5422.646432] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
Dec  2 22:35:07 local kernel: [ 5422.646437] nouveau 0000:01:00.0: X[3819]: channel 4 killed!
Dec  2 22:35:31 local kernel: [ 5446.744051] sysrq: SysRq : Keyboard mode set to system default
Dec  2 22:35:32 local kernel: [ 5447.080135] sysrq: SysRq : Terminate All Tasks
Comment 1 Marc B. 2019-05-05 15:18:50 UTC
It would be soooo cool if anyone would actually read this bug report and maybe try to fix it. I will assist in testing patches until this is resolved.

And: I am willing to offer $100 for fixing this annoying bug! Keeps freezing my 4.19.39 kernel out of nowhere.

Some things I would like to get into discussion:

a) - it might have something to do with memory pressure

_and_

b)
- high CPU load
_or_
- high number of context switches.

For the latter I'm not sure. The bug actually always occurs when I ie. compile two kernels at -j24 and habe some other work besides this, say a YT video. The bug is, however, definitely triggered by a graphics event, ie. resizing/creating a window, scrolling a Web page or watching a video.


[2019-05-04 15:43:24] err kern 03 kernel : [  523.906459] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[2019-05-04 15:43:24] notice kern 05 kernel : [  523.906467] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
[2019-05-04 15:43:24] notice kern 05 kernel : [  523.906473] nouveau 0000:01:00.0: fifo: channel 2: killed
[2019-05-04 15:43:24] notice kern 05 kernel : [  523.906479] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
[2019-05-04 15:43:24] warning kern 04 kernel : [  523.906789] nouveau 0000:01:00.0: X[8006]: channel 2 killed!
[2019-05-04 15:43:24] err kern 03 kernel : nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[2019-05-04 15:43:24] notice kern 05 kernel : nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
[2019-05-04 15:43:24] notice kern 05 kernel : nouveau 0000:01:00.0: fifo: channel 2: killed
[2019-05-04 15:43:24] notice kern 05 kernel : nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
[2019-05-04 15:43:24] warning kern 04 kernel : nouveau 0000:01:00.0: X[8006]: channel 2 killed!
[2019-05-04 15:44:24] info kern 06 kernel : [  584.121331] sysrq: SysRq : Keyboard mode set to system default
[2019-05-04 15:44:24] info kern 06 kernel : sysrq: SysRq : Keyboard mode set to system default
Comment 2 Harry Coin 2019-07-21 15:08:07 UTC
Here's another freeze report:
From
$ uname -a
Linux ceo1homenx 5.2.0-8-generic #9-Ubuntu SMP Mon Jul 8 13:07:27 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
just before lock syslog:

Jul 21 09:45:20 ceo1homenx kernel: [89849.919490] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Jul 21 09:45:20 ceo1homenx kernel: [89849.919500] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
Jul 21 09:45:20 ceo1homenx kernel: [89849.919506] nouveau 0000:01:00.0: fifo: channel 8: killed
Jul 21 09:45:20 ceo1homenx kernel: [89849.919511] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
Jul 21 09:45:20 ceo1homenx kernel: [89849.919815] nouveau 0000:01:00.0: Xorg[1546]: channel 8 killed!
-- hard lock --
Comment 3 interface 2019-08-13 14:00:14 UTC
i think i got this issue, too:

→ uname -a
Linux sticke 4.19.66-1-MANJARO #1 SMP PREEMPT Fri Aug 9 18:01:53 UTC 2019 x86_64 GNU/Linux
→ journalctl -b-1
Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000240000 engine 00 [GR] client 0f [GPC0/PROP_0] reason 82 [] on channel 2 [003fbec000 Xorg[634]]
Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: fifo: channel 2: killed
Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: Xorg[634]: channel 2 killed!

btw. i am working with exactly the same os (usb stick) on a different hardware where this problem does not occur. pls let me know if i should post more details (and which ones).
Comment 4 Simon Fogliato 2023-08-16 16:02:56 UTC
sf@sf-T3600 ~ % uname -a
Linux sf-T3600 6.4.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 11 Aug 2023 11:03:36 +0000 x86_64 GNU/Linux

Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: fifo: fault 01 [WRITE] at 000000000002e000 engine 15 [PCE0] client 01 [HUB/PCOPY0] reason 02 [PAGE_NOT_PRES>
Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: fifo:000000:0001:[(udev-worker)[738]] rc scheduled
Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: fifo:000000: rc scheduled
Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: fifo:000000:0001:0001:[(udev-worker)[738]] errored - disabling channel
Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: DRM: channel 1 killed!
Aug 16 08:21:08 sf-T3600 kernel: sched: RT throttling activated
Aug 16 08:21:44 sf-T3600 rtkit-daemon[1065]: Supervising 8 threads of 5 processes of 1 users.
Aug 16 08:21:44 sf-T3600 rtkit-daemon[1065]: Supervising 8 threads of 5 processes of 1 users.
Aug 16 08:21:49 sf-T3600 kernel: ------------[ cut here ]------------
Aug 16 08:21:49 sf-T3600 kernel: WARNING: CPU: 0 PID: 19149 at mm/gup.c:1101 __get_user_pages+0x582/0x680
Aug 16 08:21:49 sf-T3600 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq rfkill intel_rapl_msr intel_rapl_common uvcvideo x86_pkg_temp_thermal intel_>
Aug 16 08:21:49 sf-T3600 kernel:  crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni uas polyval_generic usb_storage usbhid gf128mul ghash_clmulni_intel s>
Aug 16 08:21:49 sf-T3600 kernel: CPU: 0 PID: 19149 Comm: chrome_crashpad Tainted: G        W  OE      6.4.10-arch1-1 #1 2d4402bf7ad4a7ea488c9261840b8101c9d1e712
Aug 16 08:21:49 sf-T3600 kernel: Hardware name: Dell Inc. Precision T3600/08HPGT, BIOS A15 05/08/2017
Aug 16 08:21:49 sf-T3600 kernel: RIP: 0010:__get_user_pages+0x582/0x680
Aug 16 08:21:49 sf-T3600 kernel: Code: 00 e9 cb fd ff ff 48 03 bd 88 00 00 00 e9 c7 fb ff ff 48 81 e1 00 f0 ff ff e9 4b fc ff ff 48 81 e2 00 f0 ff ff e9 b5 fc ff >
Aug 16 08:21:49 sf-T3600 kernel: RSP: 0018:ffffb531ccc17bf8 EFLAGS: 00010202
Aug 16 08:21:49 sf-T3600 kernel: RAX: ffff94633a009cc0 RBX: 000000000005000a RCX: 00007ffdc4e02fff
Aug 16 08:21:49 sf-T3600 kernel: RDX: 0000000000000000 RSI: 00007eff84e4b000 RDI: ffff9463c986a080
Aug 16 08:21:49 sf-T3600 kernel: RBP: ffff946398f0bc80 R08: ffff94633a0b8008 R09: 0000000000000001
Aug 16 08:21:49 sf-T3600 kernel: R10: ffff94633a0b8080 R11: ffff94633a0b800c R12: 0000000000000000
Aug 16 08:21:49 sf-T3600 kernel: R13: ffff94633a009cc0 R14: ffffb531ccc17cbc R15: ffffb531ccc17cbc
Aug 16 08:21:49 sf-T3600 kernel: FS:  00007f7035d135c0(0000) GS:ffff946a2f600000(0000) knlGS:0000000000000000
Aug 16 08:21:49 sf-T3600 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 16 08:21:49 sf-T3600 kernel: CR2: 000036cc0040c300 CR3: 000000018e95e004 CR4: 00000000000626f0
Aug 16 08:21:49 sf-T3600 kernel: Call Trace:
Aug 16 08:21:49 sf-T3600 kernel:  <TASK>
Aug 16 08:21:49 sf-T3600 kernel:  ? __get_user_pages+0x582/0x680
Aug 16 08:21:49 sf-T3600 kernel:  ? __warn+0x81/0x130
Aug 16 08:21:49 sf-T3600 kernel:  ? __get_user_pages+0x582/0x680
Aug 16 08:21:49 sf-T3600 kernel:  ? report_bug+0x171/0x1a0
Aug 16 08:21:49 sf-T3600 kernel:  ? handle_bug+0x3c/0x80
Aug 16 08:21:49 sf-T3600 kernel:  ? exc_invalid_op+0x17/0x70
Aug 16 08:21:49 sf-T3600 kernel:  ? asm_exc_invalid_op+0x1a/0x20
Aug 16 08:21:49 sf-T3600 kernel:  ? __get_user_pages+0x582/0x680
Aug 16 08:21:49 sf-T3600 kernel:  ? __get_user_pages+0x8a/0x680
Aug 16 08:21:49 sf-T3600 kernel:  get_user_pages_remote+0x14a/0x400
Aug 16 08:21:49 sf-T3600 kernel:  __access_remote_vm+0x1bf/0x420
Aug 16 08:21:49 sf-T3600 kernel:  mem_rw.isra.0+0x111/0x1d0
Aug 16 08:21:49 sf-T3600 kernel:  vfs_read+0xac/0x320
Aug 16 08:21:49 sf-T3600 kernel:  ? mem_rw.isra.0+0x18a/0x1d0
Aug 16 08:21:49 sf-T3600 kernel:  ? vfs_read+0xac/0x320
Aug 16 08:21:49 sf-T3600 kernel:  __x64_sys_pread64+0x98/0xd0
Aug 16 08:21:49 sf-T3600 kernel:  do_syscall_64+0x60/0x90
Aug 16 08:21:49 sf-T3600 kernel:  ? __x64_sys_pread64+0xa8/0xd0
Aug 16 08:21:49 sf-T3600 kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Aug 16 08:21:49 sf-T3600 kernel:  ? do_syscall_64+0x6c/0x90
Aug 16 08:21:49 sf-T3600 kernel:  ? do_syscall_64+0x6c/0x90
Aug 16 08:21:49 sf-T3600 kernel:  entry_SYSCALL_64_after_hwframe+0x77/0xe1
Aug 16 08:21:49 sf-T3600 kernel: RIP: 0033:0x7f7035ae8d07
Aug 16 08:21:49 sf-T3600 kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 85 00 fa ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f >
Aug 16 08:21:49 sf-T3600 kernel: RSP: 002b:00007fff5f79a8f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
Aug 16 08:21:49 sf-T3600 kernel: RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f7035ae8d07
Aug 16 08:21:49 sf-T3600 kernel: RDX: 0000000000001000 RSI: 00007fff5f79abe0 RDI: 0000000000000007
Aug 16 08:21:49 sf-T3600 kernel: RBP: 00007fff5f79aa90 R08: 0000000000000000 R09: 000055f24f499c20
Aug 16 08:21:49 sf-T3600 kernel: R10: 00007eff84e4a880 R11: 0000000000000293 R12: 00007eff84e4a880
Aug 16 08:21:49 sf-T3600 kernel: R13: 000036cc00218380 R14: 00007fff5f79abe0 R15: 0000000000001000
Aug 16 08:21:49 sf-T3600 kernel:  </TASK>
Aug 16 08:21:49 sf-T3600 kernel: ---[ end trace 0000000000000000 ]---
Aug 16 08:22:08 sf-T3600 systemd[1]: Started Getty on tty5.
Comment 5 Artem S. Tashkinov 2023-08-17 10:12:37 UTC
Please refile here https://gitlab.freedesktop.org/drm/nouveau/-/issues/

4.19.6 is terribly old and outdated regardless.

Please try at least 4.19.292
Comment 6 Simon Fogliato 2023-08-17 14:20:29 UTC
I copied my info and created an issue here:

https://gitlab.freedesktop.org/drm/nouveau/-/issues/256