Bug 201847
Summary: | nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 000000000a721000 engine 00 [GR] client 0f [GPC0/PROP_0] reason 82 [] on channel 4 [00ff85c000 X[3819]] | ||
---|---|---|---|
Product: | Drivers | Reporter: | Marc B. (kernel.org) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED ANSWERED | ||
Severity: | normal | CC: | hcoin, interface, simonfogliato |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.19.6 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Marc B.
2018-12-02 21:47:25 UTC
It would be soooo cool if anyone would actually read this bug report and maybe try to fix it. I will assist in testing patches until this is resolved. And: I am willing to offer $100 for fixing this annoying bug! Keeps freezing my 4.19.39 kernel out of nowhere. Some things I would like to get into discussion: a) - it might have something to do with memory pressure _and_ b) - high CPU load _or_ - high number of context switches. For the latter I'm not sure. The bug actually always occurs when I ie. compile two kernels at -j24 and habe some other work besides this, say a YT video. The bug is, however, definitely triggered by a graphics event, ie. resizing/creating a window, scrolling a Web page or watching a video. [2019-05-04 15:43:24] err kern 03 kernel : [ 523.906459] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [2019-05-04 15:43:24] notice kern 05 kernel : [ 523.906467] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery [2019-05-04 15:43:24] notice kern 05 kernel : [ 523.906473] nouveau 0000:01:00.0: fifo: channel 2: killed [2019-05-04 15:43:24] notice kern 05 kernel : [ 523.906479] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery [2019-05-04 15:43:24] warning kern 04 kernel : [ 523.906789] nouveau 0000:01:00.0: X[8006]: channel 2 killed! [2019-05-04 15:43:24] err kern 03 kernel : nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [2019-05-04 15:43:24] notice kern 05 kernel : nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery [2019-05-04 15:43:24] notice kern 05 kernel : nouveau 0000:01:00.0: fifo: channel 2: killed [2019-05-04 15:43:24] notice kern 05 kernel : nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery [2019-05-04 15:43:24] warning kern 04 kernel : nouveau 0000:01:00.0: X[8006]: channel 2 killed! [2019-05-04 15:44:24] info kern 06 kernel : [ 584.121331] sysrq: SysRq : Keyboard mode set to system default [2019-05-04 15:44:24] info kern 06 kernel : sysrq: SysRq : Keyboard mode set to system default Here's another freeze report: From $ uname -a Linux ceo1homenx 5.2.0-8-generic #9-Ubuntu SMP Mon Jul 8 13:07:27 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux just before lock syslog: Jul 21 09:45:20 ceo1homenx kernel: [89849.919490] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] Jul 21 09:45:20 ceo1homenx kernel: [89849.919500] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery Jul 21 09:45:20 ceo1homenx kernel: [89849.919506] nouveau 0000:01:00.0: fifo: channel 8: killed Jul 21 09:45:20 ceo1homenx kernel: [89849.919511] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery Jul 21 09:45:20 ceo1homenx kernel: [89849.919815] nouveau 0000:01:00.0: Xorg[1546]: channel 8 killed! -- hard lock -- i think i got this issue, too: → uname -a Linux sticke 4.19.66-1-MANJARO #1 SMP PREEMPT Fri Aug 9 18:01:53 UTC 2019 x86_64 GNU/Linux → journalctl -b-1 Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000240000 engine 00 [GR] client 0f [GPC0/PROP_0] reason 82 [] on channel 2 [003fbec000 Xorg[634]] Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: fifo: channel 2: killed Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery Aug 13 15:36:55 sticke kernel: nouveau 0000:01:00.0: Xorg[634]: channel 2 killed! btw. i am working with exactly the same os (usb stick) on a different hardware where this problem does not occur. pls let me know if i should post more details (and which ones). sf@sf-T3600 ~ % uname -a Linux sf-T3600 6.4.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 11 Aug 2023 11:03:36 +0000 x86_64 GNU/Linux Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: fifo: fault 01 [WRITE] at 000000000002e000 engine 15 [PCE0] client 01 [HUB/PCOPY0] reason 02 [PAGE_NOT_PRES> Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: fifo:000000:0001:[(udev-worker)[738]] rc scheduled Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: fifo:000000: rc scheduled Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: fifo:000000:0001:0001:[(udev-worker)[738]] errored - disabling channel Aug 16 08:21:06 sf-T3600 kernel: nouveau 0000:03:00.0: DRM: channel 1 killed! Aug 16 08:21:08 sf-T3600 kernel: sched: RT throttling activated Aug 16 08:21:44 sf-T3600 rtkit-daemon[1065]: Supervising 8 threads of 5 processes of 1 users. Aug 16 08:21:44 sf-T3600 rtkit-daemon[1065]: Supervising 8 threads of 5 processes of 1 users. Aug 16 08:21:49 sf-T3600 kernel: ------------[ cut here ]------------ Aug 16 08:21:49 sf-T3600 kernel: WARNING: CPU: 0 PID: 19149 at mm/gup.c:1101 __get_user_pages+0x582/0x680 Aug 16 08:21:49 sf-T3600 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq rfkill intel_rapl_msr intel_rapl_common uvcvideo x86_pkg_temp_thermal intel_> Aug 16 08:21:49 sf-T3600 kernel: crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni uas polyval_generic usb_storage usbhid gf128mul ghash_clmulni_intel s> Aug 16 08:21:49 sf-T3600 kernel: CPU: 0 PID: 19149 Comm: chrome_crashpad Tainted: G W OE 6.4.10-arch1-1 #1 2d4402bf7ad4a7ea488c9261840b8101c9d1e712 Aug 16 08:21:49 sf-T3600 kernel: Hardware name: Dell Inc. Precision T3600/08HPGT, BIOS A15 05/08/2017 Aug 16 08:21:49 sf-T3600 kernel: RIP: 0010:__get_user_pages+0x582/0x680 Aug 16 08:21:49 sf-T3600 kernel: Code: 00 e9 cb fd ff ff 48 03 bd 88 00 00 00 e9 c7 fb ff ff 48 81 e1 00 f0 ff ff e9 4b fc ff ff 48 81 e2 00 f0 ff ff e9 b5 fc ff > Aug 16 08:21:49 sf-T3600 kernel: RSP: 0018:ffffb531ccc17bf8 EFLAGS: 00010202 Aug 16 08:21:49 sf-T3600 kernel: RAX: ffff94633a009cc0 RBX: 000000000005000a RCX: 00007ffdc4e02fff Aug 16 08:21:49 sf-T3600 kernel: RDX: 0000000000000000 RSI: 00007eff84e4b000 RDI: ffff9463c986a080 Aug 16 08:21:49 sf-T3600 kernel: RBP: ffff946398f0bc80 R08: ffff94633a0b8008 R09: 0000000000000001 Aug 16 08:21:49 sf-T3600 kernel: R10: ffff94633a0b8080 R11: ffff94633a0b800c R12: 0000000000000000 Aug 16 08:21:49 sf-T3600 kernel: R13: ffff94633a009cc0 R14: ffffb531ccc17cbc R15: ffffb531ccc17cbc Aug 16 08:21:49 sf-T3600 kernel: FS: 00007f7035d135c0(0000) GS:ffff946a2f600000(0000) knlGS:0000000000000000 Aug 16 08:21:49 sf-T3600 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 16 08:21:49 sf-T3600 kernel: CR2: 000036cc0040c300 CR3: 000000018e95e004 CR4: 00000000000626f0 Aug 16 08:21:49 sf-T3600 kernel: Call Trace: Aug 16 08:21:49 sf-T3600 kernel: <TASK> Aug 16 08:21:49 sf-T3600 kernel: ? __get_user_pages+0x582/0x680 Aug 16 08:21:49 sf-T3600 kernel: ? __warn+0x81/0x130 Aug 16 08:21:49 sf-T3600 kernel: ? __get_user_pages+0x582/0x680 Aug 16 08:21:49 sf-T3600 kernel: ? report_bug+0x171/0x1a0 Aug 16 08:21:49 sf-T3600 kernel: ? handle_bug+0x3c/0x80 Aug 16 08:21:49 sf-T3600 kernel: ? exc_invalid_op+0x17/0x70 Aug 16 08:21:49 sf-T3600 kernel: ? asm_exc_invalid_op+0x1a/0x20 Aug 16 08:21:49 sf-T3600 kernel: ? __get_user_pages+0x582/0x680 Aug 16 08:21:49 sf-T3600 kernel: ? __get_user_pages+0x8a/0x680 Aug 16 08:21:49 sf-T3600 kernel: get_user_pages_remote+0x14a/0x400 Aug 16 08:21:49 sf-T3600 kernel: __access_remote_vm+0x1bf/0x420 Aug 16 08:21:49 sf-T3600 kernel: mem_rw.isra.0+0x111/0x1d0 Aug 16 08:21:49 sf-T3600 kernel: vfs_read+0xac/0x320 Aug 16 08:21:49 sf-T3600 kernel: ? mem_rw.isra.0+0x18a/0x1d0 Aug 16 08:21:49 sf-T3600 kernel: ? vfs_read+0xac/0x320 Aug 16 08:21:49 sf-T3600 kernel: __x64_sys_pread64+0x98/0xd0 Aug 16 08:21:49 sf-T3600 kernel: do_syscall_64+0x60/0x90 Aug 16 08:21:49 sf-T3600 kernel: ? __x64_sys_pread64+0xa8/0xd0 Aug 16 08:21:49 sf-T3600 kernel: ? syscall_exit_to_user_mode+0x1b/0x40 Aug 16 08:21:49 sf-T3600 kernel: ? do_syscall_64+0x6c/0x90 Aug 16 08:21:49 sf-T3600 kernel: ? do_syscall_64+0x6c/0x90 Aug 16 08:21:49 sf-T3600 kernel: entry_SYSCALL_64_after_hwframe+0x77/0xe1 Aug 16 08:21:49 sf-T3600 kernel: RIP: 0033:0x7f7035ae8d07 Aug 16 08:21:49 sf-T3600 kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 85 00 fa ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f > Aug 16 08:21:49 sf-T3600 kernel: RSP: 002b:00007fff5f79a8f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 Aug 16 08:21:49 sf-T3600 kernel: RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f7035ae8d07 Aug 16 08:21:49 sf-T3600 kernel: RDX: 0000000000001000 RSI: 00007fff5f79abe0 RDI: 0000000000000007 Aug 16 08:21:49 sf-T3600 kernel: RBP: 00007fff5f79aa90 R08: 0000000000000000 R09: 000055f24f499c20 Aug 16 08:21:49 sf-T3600 kernel: R10: 00007eff84e4a880 R11: 0000000000000293 R12: 00007eff84e4a880 Aug 16 08:21:49 sf-T3600 kernel: R13: 000036cc00218380 R14: 00007fff5f79abe0 R15: 0000000000001000 Aug 16 08:21:49 sf-T3600 kernel: </TASK> Aug 16 08:21:49 sf-T3600 kernel: ---[ end trace 0000000000000000 ]--- Aug 16 08:22:08 sf-T3600 systemd[1]: Started Getty on tty5. Please refile here https://gitlab.freedesktop.org/drm/nouveau/-/issues/ 4.19.6 is terribly old and outdated regardless. Please try at least 4.19.292 I copied my info and created an issue here: https://gitlab.freedesktop.org/drm/nouveau/-/issues/256 |