Bug 5960

Summary: NMI 35 Causes Reboot
Product: Drivers Reporter: David L. Craig (dlc)
Component: Sound(ALSA)Assignee: Jaroslav Kysela (perex)
Status: RESOLVED WILL_NOT_FIX    
Severity: normal CC: alan, bunk, kernel, mulix, protasnb, zwane
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output
kernel.log
strace of aplaymidi
kernel log
rawmidi stderr messages
rawmidi strace output
Messages from last boot
Hypervisor System Error Log entries

Description David L. Craig 2006-01-25 14:58:38 UTC
Most recent kernel where this bug did not occur: Problem exists in all kernels
tested.
Distribution: Gentoo (bug 119228)
Hardware Environment: IBM Netfinity 8664-6RY with M-Audio Delta 1010 sound card
Software Environment: ice1712, no jack, rosegarden
Problem Description: Invoking rosegarden by a luser or root induces a reboot
within 60 seconds with the kernel message:
  Uhhuh.  NMI received for unknown reason 35 on CPU 0.
Steps to reproduce:  Can also induce the problem by connecting the Delta's MIDI
to ardour's seq via qjackctl while root.
Comment 1 David L. Craig 2006-01-25 15:08:43 UTC
Created attachment 7151 [details]
dmesg output
Comment 2 David L. Craig 2006-01-25 15:18:40 UTC
Created attachment 7152 [details]
kernel.log
Comment 3 Daniel Drake 2006-01-25 15:44:38 UTC
Downstream bug at http://bugs.gentoo.org/show_bug.cgi?id=119228
We tried BIOS upgrade (none available). ACPI and APM are off, so it's not
devices panicking when they get turned off.
Comment 4 Zwane Mwaikambo 2006-01-26 09:55:15 UTC
Has this card worked recently on any other operating systems?
Comment 5 David L. Craig 2006-01-26 11:32:15 UTC
I don't have any other OSes to try.  It has functioned properly under Linux for
over a year using the lines in and out.  Attempting use of MIDI is new, however.
Comment 6 David L. Craig 2006-01-31 06:58:08 UTC
Opened incident NA17691 with M Audio North America (registration required) to
hopefully learn if the card is at fault via some definitive diagnostic method.
Comment 7 David L. Craig 2006-02-01 21:13:14 UTC
No support from M-Audio will be forthcoming.  It's up to the ice1712 maintainer
to determine what's going on.
Comment 8 Jaroslav Kysela 2006-02-02 00:19:37 UTC
Could you enable CONFIG_DEBUG_SPINLOCK for your kernel? Perhaps it's a spinlock
problem.

Also, could you reproduce the problem with aseqdump and aplaymidi command line
tools? I'd like to track down the simplest way to reproduce NMI.
Comment 9 David L. Craig 2006-02-03 15:50:44 UTC
$ cat /proc/asound/seq/clients
Client info
  cur  clients : 3
  peak clients : 3
  max  clients : 192

Client   0 : "System" [Kernel]
  Port   0 : "Timer" (Rwe-)
  Port   1 : "Announce" (R-e-)
Client  16 : "M Audio Delta 1010 MIDI" [Kernel]
  Port   0 : "M Audio Delta 1010 MIDI" (RWeX)
$ date
Fri Feb  3 18:29:45 EST 2006
$ aplaymidi --port=16 music/mmp1.mid
$ date
Fri Feb  3 18:30:10 EST 2006

Repeated date commands got up to 18:30:47 before the screen cleared for the reboot.

The kernel log has:
Feb  3 16:07:22 [kernel] [98314.950339] eth0: link up, 100Mbps, full-duplex, lpa
0x45E1
Feb  3 18:29:47 [kernel] [106855.586289] Uhhuh. NMI received for unknown reason
25 on CPU 0.
Feb  3 18:33:05 [kernel] [4294667.296000] Linux version 2.6.16-rc1 (root@muse)
(gcc version 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)) #2 SMP PREEMPT Thu
Feb 2 11:55:49 EST 2006

Thus the NMI occurred two seconds into the aplaymidi.

I built the kernel with CONFIG_DEBUG_SPINLOCKS.  Is there something I need to do
to activate it and/or display its information?
Comment 10 David L. Craig 2006-02-03 21:03:59 UTC
Created attachment 7230 [details]
strace of aplaymidi
Comment 11 Jaroslav Kysela 2006-02-07 05:00:34 UTC
Ok, can you reproduce the bug with the rawmidi test program (cd alsa-lib/test ;
make rawmidi) ? Try 'rawmidi' and 'rawmidi -i hw:0,0 -o hw:0,0'...

Because the error does not occur during the command execution, it is not
probably spinlock related. I need to check if it's sequencer related or the raw
midi access related.
Comment 12 David L. Craig 2006-02-15 09:25:09 UTC
rawmidi spawns the interrupt for input or output.  I put some code in to verify
the NMI arises out of the snd_rawmidi_open trigger by putting rawmidi to sleep
for ten seconds after the function's return.  I thought about adding some
tracing code to ice1712 but wasn't sure what to mask for to get MIDI-related inb
and outb data only.  I'll attach the rawmidi log, strace files, and kernel log
files.
Comment 13 David L. Craig 2006-02-15 09:49:40 UTC
Created attachment 7347 [details]
kernel log
Comment 14 David L. Craig 2006-02-15 09:51:33 UTC
Created attachment 7348 [details]
rawmidi stderr messages
Comment 15 David L. Craig 2006-02-15 09:52:35 UTC
Created attachment 7349 [details]
rawmidi strace output
Comment 16 Natalie Protasevich 2007-09-04 17:39:47 UTC
What is the status on this problem, does it still exist with new kernels?
Thanks.
Comment 17 David L. Craig 2007-09-05 07:42:51 UTC
I replaced the M-Audio Delta 1010 with another and have not experienced the problem since.  The unit was apparently defective.
Comment 18 David L. Craig 2008-01-01 15:55:06 UTC
Well, I finally got an RT kernel (2.6.23.11-rt14) functioning to support Rosegarden only to start getting NMI 35s again with the new M-Audio Delta 1010 card.  So either two cards have the same defect or there's something about my environment (IBM Netfinity server) that doesn't play nice with the card.  I'll post on the LAU list to see if anybody is successfully using this card with Rosegarden.  And I'll roll up my sleeves to see if I can isolate the catalyst as I did before (possible now that I've finally found this report).
Comment 19 David L. Craig 2008-01-01 20:20:15 UTC
Executing rawmidi causes the NMI as before.  The kernel messages follow:

Jan  1 23:00:11 muse kernel: [ 5551.781894] Uhhuh. NMI received for unknown reason 25 on CPU 0.
Jan  1 23:00:11 muse kernel: [ 5551.781906] Do you have a strange power saving mode enabled?
Jan  1 23:00:11 muse kernel: [ 5551.781913] Dazed and confused, but trying to continue
Jan  1 23:00:11 muse kernel: [ 5552.193162] Clocksource tsc unstable (delta = 6122311177 ns)

The reboot occurs 60 seconds after this.
Comment 20 Natalie Protasevich 2008-01-02 01:17:01 UTC
The board documentation should describe meaning of the NMI status/control register and specify what 35 means. Oddly, it has changed to 25 now. 
I noticed that you are using oprofile which utilizes NMI. It might cause problems on some chipsets. Do you always enable oprofile, have you tried turning it off? 
Can you also attach dmesg for the latest kernel boot please.
Comment 21 David L. Craig 2008-01-02 16:11:28 UTC
Created attachment 14268 [details]
Messages from last boot

Along with the kernel change, I also am running under a different distro: Debian Sid.  I'm not using oprofile now.
Comment 22 David L. Craig 2008-01-05 15:25:09 UTC
Date: Sat, 5 Jan 2008 14:52:50 -0500
From: Joe Hartley <jh@brainiac.com>
To: linux-audio-user@lists.linuxaudio.org

Subject: Re: [LAU] Delta 1010 and MIDI

I have a 1010 and have run Rosegarden, though I also had to run timidity
since the 1010 does not have any sort of on-board support for generating
sounds from MIDI playback.

I've also successfully sent MIDI out from the 1010's MIDI ports to a
synthesizer, though that was some time ago.
--
======================================================================
       Joe Hartley - UNIX/network Consultant - jh@brainiac.com
Without deviation from the norm, "progress" is not possible. - FZappa
Comment 23 David L. Craig 2008-01-10 16:50:34 UTC
Created attachment 14406 [details]
Hypervisor System Error Log entries

I just realized I never appended the log entries from the System Error Log maintained by the POWER Hypervisor of the Netfinity.  I had done so for other bug reports and thought I had for this one, too.  Maybe one of the kernel maintainers has access to the code documentation and/or insight into the timing of the entries.  I'd guess that PHYP is reflecting an NMI to the X386 system because it doesn't know any better way to deal with the error.  At least this could explain why other systems don't generate NMIs when opening MIDI ports for this card.
Comment 24 David L. Craig 2008-10-18 16:48:20 UTC
I found a forum inside IBM where I posted a thread about this problem.  Hopefully someone sufficiently knowledgeable will respond.  Visit http://www.ibm.com/developerworks/forums/thread.jspa?threadID=230355 for details.
Comment 25 Alan 2008-12-01 09:06:24 UTC
Hardware SERR from the card. Basically the audio card flagged up an error for some reason. On most PC hardware SERR is ignored so wouldn't evne be seen.
Comment 26 Alan 2010-01-19 21:15:52 UTC
I think the reality is nobody can fix this without the exact hardware and bus analysers