Kernel Bug Tracker – Bug 5960
NMI 35 Causes Reboot
Last modified: 2010-01-19 21:15:52 UTC
Most recent kernel where this bug did not occur: Problem exists in all kernels
Distribution: Gentoo (bug 119228)
Hardware Environment: IBM Netfinity 8664-6RY with M-Audio Delta 1010 sound card
Software Environment: ice1712, no jack, rosegarden
Problem Description: Invoking rosegarden by a luser or root induces a reboot
within 60 seconds with the kernel message:
Uhhuh. NMI received for unknown reason 35 on CPU 0.
Steps to reproduce: Can also induce the problem by connecting the Delta's MIDI
to ardour's seq via qjackctl while root.
Created attachment 7151 [details]
Created attachment 7152 [details]
Downstream bug at http://bugs.gentoo.org/show_bug.cgi?id=119228
We tried BIOS upgrade (none available). ACPI and APM are off, so it's not
devices panicking when they get turned off.
Has this card worked recently on any other operating systems?
I don't have any other OSes to try. It has functioned properly under Linux for
over a year using the lines in and out. Attempting use of MIDI is new, however.
Opened incident NA17691 with M Audio North America (registration required) to
hopefully learn if the card is at fault via some definitive diagnostic method.
No support from M-Audio will be forthcoming. It's up to the ice1712 maintainer
to determine what's going on.
Could you enable CONFIG_DEBUG_SPINLOCK for your kernel? Perhaps it's a spinlock
Also, could you reproduce the problem with aseqdump and aplaymidi command line
tools? I'd like to track down the simplest way to reproduce NMI.
$ cat /proc/asound/seq/clients
cur clients : 3
peak clients : 3
max clients : 192
Client 0 : "System" [Kernel]
Port 0 : "Timer" (Rwe-)
Port 1 : "Announce" (R-e-)
Client 16 : "M Audio Delta 1010 MIDI" [Kernel]
Port 0 : "M Audio Delta 1010 MIDI" (RWeX)
Fri Feb 3 18:29:45 EST 2006
$ aplaymidi --port=16 music/mmp1.mid
Fri Feb 3 18:30:10 EST 2006
Repeated date commands got up to 18:30:47 before the screen cleared for the reboot.
The kernel log has:
Feb 3 16:07:22 [kernel] [98314.950339] eth0: link up, 100Mbps, full-duplex, lpa
Feb 3 18:29:47 [kernel] [106855.586289] Uhhuh. NMI received for unknown reason
25 on CPU 0.
Feb 3 18:33:05 [kernel] [4294667.296000] Linux version 2.6.16-rc1 (root@muse)
(gcc version 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)) #2 SMP PREEMPT Thu
Feb 2 11:55:49 EST 2006
Thus the NMI occurred two seconds into the aplaymidi.
I built the kernel with CONFIG_DEBUG_SPINLOCKS. Is there something I need to do
to activate it and/or display its information?
Created attachment 7230 [details]
strace of aplaymidi
Ok, can you reproduce the bug with the rawmidi test program (cd alsa-lib/test ;
make rawmidi) ? Try 'rawmidi' and 'rawmidi -i hw:0,0 -o hw:0,0'...
Because the error does not occur during the command execution, it is not
probably spinlock related. I need to check if it's sequencer related or the raw
midi access related.
rawmidi spawns the interrupt for input or output. I put some code in to verify
the NMI arises out of the snd_rawmidi_open trigger by putting rawmidi to sleep
for ten seconds after the function's return. I thought about adding some
tracing code to ice1712 but wasn't sure what to mask for to get MIDI-related inb
and outb data only. I'll attach the rawmidi log, strace files, and kernel log
Created attachment 7347 [details]
Created attachment 7348 [details]
rawmidi stderr messages
Created attachment 7349 [details]
rawmidi strace output
What is the status on this problem, does it still exist with new kernels?
I replaced the M-Audio Delta 1010 with another and have not experienced the problem since. The unit was apparently defective.
Well, I finally got an RT kernel (220.127.116.11-rt14) functioning to support Rosegarden only to start getting NMI 35s again with the new M-Audio Delta 1010 card. So either two cards have the same defect or there's something about my environment (IBM Netfinity server) that doesn't play nice with the card. I'll post on the LAU list to see if anybody is successfully using this card with Rosegarden. And I'll roll up my sleeves to see if I can isolate the catalyst as I did before (possible now that I've finally found this report).
Executing rawmidi causes the NMI as before. The kernel messages follow:
Jan 1 23:00:11 muse kernel: [ 5551.781894] Uhhuh. NMI received for unknown reason 25 on CPU 0.
Jan 1 23:00:11 muse kernel: [ 5551.781906] Do you have a strange power saving mode enabled?
Jan 1 23:00:11 muse kernel: [ 5551.781913] Dazed and confused, but trying to continue
Jan 1 23:00:11 muse kernel: [ 5552.193162] Clocksource tsc unstable (delta = 6122311177 ns)
The reboot occurs 60 seconds after this.
The board documentation should describe meaning of the NMI status/control register and specify what 35 means. Oddly, it has changed to 25 now.
I noticed that you are using oprofile which utilizes NMI. It might cause problems on some chipsets. Do you always enable oprofile, have you tried turning it off?
Can you also attach dmesg for the latest kernel boot please.
Created attachment 14268 [details]
Messages from last boot
Along with the kernel change, I also am running under a different distro: Debian Sid. I'm not using oprofile now.
Date: Sat, 5 Jan 2008 14:52:50 -0500
From: Joe Hartley <firstname.lastname@example.org>
Subject: Re: [LAU] Delta 1010 and MIDI
I have a 1010 and have run Rosegarden, though I also had to run timidity
since the 1010 does not have any sort of on-board support for generating
sounds from MIDI playback.
I've also successfully sent MIDI out from the 1010's MIDI ports to a
synthesizer, though that was some time ago.
Joe Hartley - UNIX/network Consultant - email@example.com
Without deviation from the norm, "progress" is not possible. - FZappa
Created attachment 14406 [details]
Hypervisor System Error Log entries
I just realized I never appended the log entries from the System Error Log maintained by the POWER Hypervisor of the Netfinity. I had done so for other bug reports and thought I had for this one, too. Maybe one of the kernel maintainers has access to the code documentation and/or insight into the timing of the entries. I'd guess that PHYP is reflecting an NMI to the X386 system because it doesn't know any better way to deal with the error. At least this could explain why other systems don't generate NMIs when opening MIDI ports for this card.
I found a forum inside IBM where I posted a thread about this problem. Hopefully someone sufficiently knowledgeable will respond. Visit http://www.ibm.com/developerworks/forums/thread.jspa?threadID=230355 for details.
Hardware SERR from the card. Basically the audio card flagged up an error for some reason. On most PC hardware SERR is ignored so wouldn't evne be seen.
I think the reality is nobody can fix this without the exact hardware and bus analysers