Bug 29022
Summary: | [REGRESSION? 2.6.38-rc4] nouveau NV50/NVA8 screen freeze | ||
---|---|---|---|
Product: | Drivers | Reporter: | Marc B. (kernel.org) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | akpm, alan, dev, florian, maciej.rutecki, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.38-rc4 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
2.6.37 - no freeze
2.6.38-rc4-dezzy-00482-ga0dc00b-dirty - freeze |
Description
Marc B.
2011-02-13 12:29:13 UTC
Created attachment 47612 [details]
2.6.37 - no freeze
Created attachment 47622 [details]
2.6.38-rc4-dezzy-00482-ga0dc00b-dirty - freeze
[mi] EQ overflowing. The server is probably stuck in an infinite loop. Was the message appearing on 2.6.38+ ... See also: "Chris Clayton - System lockup with 2.6.38-rc4+" on LKML I have the slight assumption that the problem is more often triggered with the VirtualBox kernel modules loaded. I know that you guys consider this setup 'Tainted', however, it's a real-world setup that used to work. (In reply to comment #5) > I have the slight assumption that the problem is more often triggered with > the > VirtualBox kernel modules loaded. I know that you guys consider this setup > 'Tainted', however, it's a real-world setup that used to work. This was not true. The beavior persists with -rc7. Screen will freeze _always_ after about 3 - 4 minutes with mesa-HEAD and xorg 1.10.0. 2.6.37 is fine. Hey, we have somthing in the logs: Mar 5 09:57:44 marc kernel: [71218.224538] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP - Ch 4/5 Class 0x8597 Mthd 0x15e0 Data 0x00000000:0x00000000 Mar 5 09:57:44 marc kernel: [71218.224554] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 080000 warp 0, opcode 000033cc 00ffffff Mar 5 09:57:44 marc kernel: [71218.224705] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP - Ch 4/5 Class 0x8597 Mthd 0x15e0 Data 0x00000000:0x00000000 Mar 5 09:57:44 marc kernel: [71218.224717] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 080000 warp 0, opcode 000033cc 00ffffff Mar 5 09:57:44 marc kernel: [71218.224726] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP_EXEC - TP 0 MP 1: INVALID_OPCODE at 083008 warp 1, opcode 00f8f3e6 00f8f3e6 Mar 5 09:57:44 marc kernel: [71218.224899] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP - Ch 4/5 Class 0x8597 Mthd 0x15e0 Data 0x00000000:0x00000000 Mar 5 09:57:44 marc kernel: [71218.224911] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 080000 warp 0, opcode 000033cc 00ffffff Mar 5 09:57:44 marc kernel: [71218.224921] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP_EXEC - TP 0 MP 1: INVALID_OPCODE at 083008 warp 1, opcode 00f8f3e6 00f8f3e6 Mar 5 09:57:44 marc kernel: [71218.224992] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP - Ch 4/5 Class 0x8597 Mthd 0x15e0 Data 0x00000000:0x00000000 Mar 5 09:57:44 marc kernel: [71218.225003] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 080000 warp 0, opcode 000033cc 00ffffff Mar 5 09:57:44 marc kernel: [71218.225013] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP_EXEC - TP 0 MP 1: INVALID_OPCODE at 083008 warp 1, opcode 00f8f3e6 00f8f3e6 Mar 5 09:57:44 marc kernel: [71218.225162] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP - Ch 4/5 Class 0x8597 Mthd 0x15e0 Data 0x00000000:0x00000000 Mar 5 09:57:44 marc kernel: [71218.225168] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP - TP0: Unhandled ustatus 0x00020000 Mar 5 09:58:11 marc kernel: [71246.831804] [drm] nouveau 0000:01:00.0: nouveau_channel_free: freeing fifo 4 Mar 5 09:58:24 marc kernel: [71259.463319] [drm] nouveau 0000:01:00.0: Allocating FIFO number 4 Mar 5 09:58:24 marc kernel: [71259.473828] [drm] nouveau 0000:01:00.0: nouveau_channel_alloc: initialised FIFO 4 This, however, seems to be something different. When I reported the bug there were no such logs. Maybe they just didn't get into kern.log or there are two bugs. If you need more info on something, please ask. If this problem occurs on a NVA8 it's not an regression. NVA3+ have some random lockups where nobody figured out why. So it never worked okay, although some changes in the codebase could have made it more likely to trigger the lockup. I strongly suspect the PGRAPH_TRAP is a unrelated bug, since we never saw anything in the logs after random lockup. I updated the whole X11 stuff to current git and the problem seems to be gone. I have the freezing kernel running for about 15 minutes now and tried anything to make it freeze (composite WM, Firefox, several glxgears instances and fullscreen terminals just doing some stuff to make it print a lot of text (like cat'ing dmesg in loops)). Weird. ... but the workaround cannot be to upgrade X11 to HEAD. :) Any hints on what one could do to track this down? OK, consider this as 'not said'. Just a second after I hit 'Commit' to send the post the desktop froze. :/ Do you see this freeze on NVA8 only or also on other cards? If it is only NVA8 there is nothing you can do, as nobody knows how to tackle this bug. You may pary that Ben finds a way to reprodue so he can hopefully fix this. I have no other card to test over here. :/ rc8 still fails to behave like 2.6.37. Is this really going to be rolled out? This _is_ already rolled out, as no nouveau version so far didn't lock up randomly on NVA8. It seems you were enormously lucky to not see them on .37. To be honest, I doubt _this_ is rolled out. The behavior might be there in 2.6.37 but in 2.6.38 it seems to be worse. With 2.6.37 I have some random lock-ups leading to the GPU once in a month (I consider this normal, even as NVidia itself has problems like this with their drivers). The recent 2.6.38-rc kernels freeze actually after some minutes. And this can be reproduced. But OK, since you seem to say that there's nothing we could do about it at this point - as nobody seems to know a solution - what do we do with the bug report? Close it? Sad to hear I won't be able to use the 2.6.38 release. If you find the commit that made this worse it may hint at a solution or maybe just reverting fixes things... so if you can use git bisect (man git-bisect), please do so. No please leave it open for now, just remove the regression flag. Also you could add _full_ dmesg output and the versions of your userspace (xf86-video-nouveau, x-server, libdrm and if installed mesa). Dmesg after crash may also be interesting (hopefully we find something this time, but I doubt it). Removed from 'regressions' due to request and perhaps actual sense. Curently I'd already like to offer money for this to get fixed. Damn thing this seems to be so tricky... :/ nouveau.noaccel=1 'fixes' the issue. I have an uptime of 4 days now with 2.6.38.2 without any problem. But performance wise .... uhm. If this bug is still seen with 3.2/3.4+ kernels please re-open |