Bug 8058

Summary: 2.6.21-rc1 Panic (in forcedeth driver?)
Product: Drivers Reporter: Albert Hopkins (kernel)
Component: NetworkAssignee: Ayaz Abdulla (aabdulla)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: high CC: bunk, cw
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.21-rc1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Panic output and kernel config
picture of kernel panic on monitor
Photo of panic from 2.6.21-rc2
Kernel Oops in 2.6.21-rc3
Fixed calling the interrupt routine based on descriptor version.

Description Albert Hopkins 2007-02-22 09:32:17 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.20
Distribution: Gentoo Linux (x86)
Hardware Environment: ASUS A8N-SLI DELUXE
Software Environment: Kernel NFS/nfs-utils 1.0.10
Problem Description: NFS file server serves MythTV files and streams video via
NFS.  Recording shows (i.e. writing to NFS server) appears to work fine.  Even
watching recorded shows (reading from NFS server).  However when I stream .avi
videos direct from the NFS server I get a panic, usually when I stop watching
(close()?) the file.

Steps to reproduce:
* Using mythtv, browse a video. Videos are housed on the NFS server.
* Watch video.  This usually (but not always) goes fine.
* Press escape on mythtv (NFS client) to stop watching video
* Kernel panics 90% of the time.

I've tried this with and without the new dynamic ticks.  It occurs either way. 
Most (small?) network traffic appears fine. All other kernel opts are pretty
much the same as the working 2.6.20.

# cat /proc/cmdline 
root=/dev/md4 vga=791 

No initrd/initramfs.

Haven't tried other kinds of heavy traffic (i.e. network backups) but can if
needed.  Let me know if you need any more info.

Panic output transcribed from picture taken from digital camera.
Comment 1 Albert Hopkins 2007-02-22 09:32:59 UTC
Created attachment 10496 [details]
Panic output and kernel config
Comment 2 Adrian Bunk 2007-02-26 15:52:19 UTC
Please attach the photo you have taken from the BUG.
Comment 3 Albert Hopkins 2007-02-27 07:36:38 UTC
Created attachment 10546 [details]
picture of kernel panic on monitor

Not the same pic as from the previous attachment as I no longer have that one. 
This one was generated today.  This time it happened as soon as I began to view
a video file from the NFS client.

Sorry for the poor quality of the photo.  I tried to scale, crop, and sharpen
it.  Let me know if you need the original.

I will also add that I do not get this panic with Linux 2.6.20.1.
Comment 4 Adrian Bunk 2007-02-28 15:53:16 UTC
2.6.20-rc2 contains several forcedeth fixes.

Is your issue also fixed or is it still present?
Comment 5 Adrian Bunk 2007-02-28 15:53:46 UTC
sorry, I meant 2.6.21-rc2
Comment 6 Albert Hopkins 2007-03-01 19:43:21 UTC
I built and tried 2.6.20-rc2.  After a while I thought it was going to be ok but
eventually the kernel oopsed again.
Comment 7 Albert Hopkins 2007-03-01 19:45:29 UTC
Created attachment 10579 [details]
Photo of panic from 2.6.21-rc2

Also wanted to add there is also an onboard Marvell (Yukon) NIC on the
motherboard and when I switch to it I do not have the issue.
Comment 8 Ayaz Abdulla 2007-03-04 22:14:03 UTC
Can we do a quick sanity check by running the broken driver (2.6.21-rc1) on 
the 2.6.20 kernel? This will rule out any kernel changes.

Or if its easier, run the driver from 2.6.20 on kernel 2.6.21-rc1
Comment 9 Albert Hopkins 2007-03-05 11:27:29 UTC
I am leaning toward it not being a generic kernel issue because right now I am
running 2.6.21-rc2 on the same machine, which also has a Marvell NIC, and using
the Yukon driver i do not experience this issue.

But I will try it later when I get the opportunity.
Comment 10 Albert Hopkins 2007-03-06 12:26:39 UTC
Basically I can't get the 2.6.20 forecedeth driver to work with kernel
2.6.21-rc2.  The module will load.  I have loaded it via "insmod
/var/lib/modules/2.6.20/..." as well as copying it into the
/var/lib/modules/2.6.21-rc2/... tree and letting udev handle it.  Results are
the same in both cases. The module loads but does not produce an ethn device
(wrt ifconfig -a). "dmesg" reveals the following:

forcedeth: no version for "struct_module" found: kernel tainted.

This is the first time I've loaded a kernel module across kernel versions so
there's always the possibility that I'm doing something wrong.

The following is from "modinfo forcedeth":

filename:       /lib/modules/2.6.21-rc2/kernel/drivers/net/forcedeth.ko
license:        GPL
description:    Reverse Engineered nForce ethernet driver
author:         Manfred Spraul <manfred@colorfullife.com>
alias:          pci:v000010DEd0000054Fsv*sd*bc*sc*i*
alias:          pci:v000010DEd0000054Esv*sd*bc*sc*i*
alias:          pci:v000010DEd0000054Dsv*sd*bc*sc*i*
alias:          pci:v000010DEd0000054Csv*sd*bc*sc*i*
alias:          pci:v000010DEd00000453sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000452sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000451sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000450sv*sd*bc*sc*i*
alias:          pci:v000010DEd000003EFsv*sd*bc*sc*i*
alias:          pci:v000010DEd000003EEsv*sd*bc*sc*i*
alias:          pci:v000010DEd000003E6sv*sd*bc*sc*i*
alias:          pci:v000010DEd000003E5sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000373sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000372sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000269sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000268sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000038sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000037sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000057sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000056sv*sd*bc*sc*i*
alias:          pci:v000010DEd000000DFsv*sd*bc*sc*i*
alias:          pci:v000010DEd000000E6sv*sd*bc*sc*i*
alias:          pci:v000010DEd0000008Csv*sd*bc*sc*i*
alias:          pci:v000010DEd00000086sv*sd*bc*sc*i*
alias:          pci:v000010DEd000000D6sv*sd*bc*sc*i*
alias:          pci:v000010DEd00000066sv*sd*bc*sc*i*
alias:          pci:v000010DEd000001C3sv*sd*bc*sc*i*
depends:        
vermagic:       2.6.20.1 SMP preempt mod_unload K8 
parm:           max_interrupt_work:forcedeth maximum events handled per
interrupt (int)
parm:           optimization_mode:In throughput mode (0), every tx & rx packet
will generate an interrupt. In CPU mode (1), interrupts are controlled by a
timer. (int)
parm:           poll_interval:Interval determines how frequent timer interrupt
is generated by [(time_in_micro_secs * 100) / (2^10)]. Min is 0 and Max is
65535. (int)
parm:           msi:MSI interrupts are enabled by setting to 1 and disabled by
setting to 0. (int)
parm:           msix:MSIX interrupts are enabled by setting to 1 and disabled by
setting to 0. (int)
parm:           dma_64bit:High DMA is enabled by setting to 1 and disabled by
setting to 0. (int)


Comment 11 Albert Hopkins 2007-03-07 14:39:23 UTC
Created attachment 10644 [details]
Kernel Oops in 2.6.21-rc3

I downloaded the rc3 patch today and installed it.  Basically the same thing
happens.  I'll stream a video over NFS, it'll play for maybe a minute then
oops.

No problems however with the 2.6.21-rc* Yukon driver or the 2.6.20 forcedeth
one.

This oops looks like it didn't get a chance to finish writing to the console.
Comment 12 Ayaz Abdulla 2007-03-11 20:39:38 UTC
Created attachment 10704 [details]
Fixed calling the interrupt routine based on descriptor version.

Please try out this patch and let me know if it fixes this issue.
Comment 13 Ayaz Abdulla 2007-03-20 12:09:12 UTC
Albert, could you please try out the new patch? Thanks.
Comment 14 Albert Hopkins 2007-03-21 06:17:59 UTC
Sorry, I've been busy.  Will try to look at the patch today.
Comment 15 Albert Hopkins 2007-03-22 11:13:26 UTC
Applied the patch against 2.6.21-rc4.  So far after ~2 hours streaming via NFS
there have been no oopses.  However, I am getting a multitude of the following
kernel message:

eth0: too many iterations (6) in nv_nic_irq.
Comment 16 Ayaz Abdulla 2007-03-22 12:20:40 UTC
Great!

Those messages are benign. You can bump up the module parameter 
max_interrupt_work to a value of 10 for example. It just means that there is 
alot of traffic flowing.
Comment 17 Albert Hopkins 2007-03-22 13:49:01 UTC
Ok, thanks.  In that case feel free to close this one out whenever.