Bug 10453
Summary: | PROBLEM: Hard lock dual-core, x86_64, NV driver | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Doug Springer (rickyrockrat) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | CLOSED OBSOLETE | ||
Severity: | blocking | CC: | alan, exvor, jimsantelmo, ralph, rickyrockrat |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24.2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
Doug Springer
2008-04-14 08:08:32 UTC
I marked this as a regression (whcih I assume it is) No, it actually has been an ongoing problem. It goes at least as far back as 2.6.10, and extends to at least 2.6.24.2, and I have not as yet tested the latest kernel, whatever that is. Point me to a kernel source tree and I'll try it, but I suspect this is a root cause of a very long Ubuntu forum with many people having lockups. I realize that's like saying hundreds of people have problems with their car starting where most of them are not problems with the engine. Thanks --- bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10453 > > > akpm@osdl.org changed: > > What |Removed > |Added > ---------------------------------------------------------------------------- > Regression|0 |1 > > > > > ------- Comment #1 from akpm@osdl.org 2008-04-14 > 12:54 ------- > I marked this as a regression (whcih I assume it is) > > > -- > Configure bugmail: > http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ Reply-To: akpm@linux-foundation.org On Mon, 14 Apr 2008 13:48:19 -0700 (PDT) Doug the RockRat <rickyrockrat@yahoo.com> wrote: > No, it actually has been an ongoing problem. It goes > at least as far back as 2.6.10, and extends to at > least 2.6.24.2, hm. > and I have not as yet tested the > latest kernel, whatever that is. Point me to a kernel > source tree and I'll try it, ftp://ftp.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.25-rc9.tar.gz would suit, but I'd be surprised if this was magically fixed. > but I suspect this is a > root cause of a very long Ubuntu forum with many > people having lockups. A link to that discussion might be useful, please? Here is the link http://ubuntuforums.org/showthread.php?t=412125 Bear in mind that 1) some of these are not related - I suspect folks have some app killing their mem and doing massive swapping, so it only *appears* to freeze. 2) Ubuntu doesn't provide a catch-all syslog to even have the ability to catch this happening. This has been going on (for me, at least for over a year), and according to this list, happens with both Nvidia and ATI drivers, both free (at least for NV) and binary. Most people run with i386, not x86_64 for other compatibility reasons, and I have a laptop running a 64 bit, duo-core system that does not seem to have this problem. That is to say, duo-core systems on i386 do not appear to have this little feature Here is another forum with 64-bit lockups with another version of Ubuntu. http://ubuntuforums.org/showthread.php?t=587905 if the problem is reproducible, try the nmi_watchdog=2 boot parameter and try to provoke the lockup in text mode - you should get an NMI printout within a minute after the hard lockup. Ingo I wouldn't call it reproducible, since it can take anywhere from a day to a week and a half to happen. It is not related to any usage pattern I can see, sometimes it happens while the computer has not been used for hours, sometimes it happens while using it. Since it is my main dev box, I can't run in text mode for a week, but I will try the boot param next time I take the box down, or it locks whichever is first. * bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> wrote: > I wouldn't call it reproducible, since it can take anywhere from a day > to a week and a half to happen. It is not related to any usage pattern > I can see, sometimes it happens while the computer has not been used > for hours, sometimes it happens while using it. Since it is my main > dev box, I can't run in text mode for a week, but I will try the boot > param next time I take the box down, or it locks whichever is first. if you need graphics mode and if you have another box on the same LAN then you could try netconsole logging - the NMI watchdog should be able to print via the netconsole too. (i've done such backtraces myself) but ... this type of bug is one of the most difficult ones to track down. Ingo In addition to those Ubuntu forum threads, I think the matching Ubuntu bug report is https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/157777 This may be related to the same issue I have with my tablet that I just purchased and trying to build a LFS system on. If i run ACPI at all in x86_64 bit mode I have a hard lock after only a few min. Mostly if I attempt to untar something or list the directory of something. I have been investigating what the problem is but sadly I have yet to find a solution other then running a non acpi kernel or with the noacpi kernel option. Note I do get another message about TSC unstable time source. This problem does not occur in 32bit mode my computer is a HP tx2110us kernel im running is 2.6.24 I've been plagued by this for a long time, but it seems to have gotten worse recently, especially in 2.6.28 and the 29 RCs. I'm seeing it while running X (with the nvidia module), running a browser, and moving a USB mouse. I think in my version there is a USB component, as I am never seeing it while typing, only when I am actually moving the mouse. Hard hang, responds to nothing. Just finally got around to searching "NMI watchdog" and discovered I need to add "nmi_watchdog=1" (1 for SMP systems) for the NMI watchdog to work. Let's hope for some output. I take that back. nmi_watchdog didn't work, kernel complained about stuck NMI interrupts. I'm now using "nmi_watchdog=panic,lapic" per Documentation/kernel-parameters.txt [not quite sure what the "panic" is for, but it sounds good). Showing NMI interrupts in /proc/interrupts, so maybe I'll learn something next time it hangs, if I can figure out how to get the panic output. The xfs people had an in-kernel debugger which might be handy if the NMI code could be made to force the console into text mode and invoke the debugger. Closing as obsolete, please re-open if this is incorrect |