Most recent kernel where this bug did *NOT* occur: 2.6.20 Distribution: Gentoo Linux (x86) Hardware Environment: ASUS A8N-SLI DELUXE Software Environment: Kernel NFS/nfs-utils 1.0.10 Problem Description: NFS file server serves MythTV files and streams video via NFS. Recording shows (i.e. writing to NFS server) appears to work fine. Even watching recorded shows (reading from NFS server). However when I stream .avi videos direct from the NFS server I get a panic, usually when I stop watching (close()?) the file. Steps to reproduce: * Using mythtv, browse a video. Videos are housed on the NFS server. * Watch video. This usually (but not always) goes fine. * Press escape on mythtv (NFS client) to stop watching video * Kernel panics 90% of the time. I've tried this with and without the new dynamic ticks. It occurs either way. Most (small?) network traffic appears fine. All other kernel opts are pretty much the same as the working 2.6.20. # cat /proc/cmdline root=/dev/md4 vga=791 No initrd/initramfs. Haven't tried other kinds of heavy traffic (i.e. network backups) but can if needed. Let me know if you need any more info. Panic output transcribed from picture taken from digital camera.
Created attachment 10496 [details] Panic output and kernel config
Please attach the photo you have taken from the BUG.
Created attachment 10546 [details] picture of kernel panic on monitor Not the same pic as from the previous attachment as I no longer have that one. This one was generated today. This time it happened as soon as I began to view a video file from the NFS client. Sorry for the poor quality of the photo. I tried to scale, crop, and sharpen it. Let me know if you need the original. I will also add that I do not get this panic with Linux 2.6.20.1.
2.6.20-rc2 contains several forcedeth fixes. Is your issue also fixed or is it still present?
sorry, I meant 2.6.21-rc2
I built and tried 2.6.20-rc2. After a while I thought it was going to be ok but eventually the kernel oopsed again.
Created attachment 10579 [details] Photo of panic from 2.6.21-rc2 Also wanted to add there is also an onboard Marvell (Yukon) NIC on the motherboard and when I switch to it I do not have the issue.
Can we do a quick sanity check by running the broken driver (2.6.21-rc1) on the 2.6.20 kernel? This will rule out any kernel changes. Or if its easier, run the driver from 2.6.20 on kernel 2.6.21-rc1
I am leaning toward it not being a generic kernel issue because right now I am running 2.6.21-rc2 on the same machine, which also has a Marvell NIC, and using the Yukon driver i do not experience this issue. But I will try it later when I get the opportunity.
Basically I can't get the 2.6.20 forecedeth driver to work with kernel 2.6.21-rc2. The module will load. I have loaded it via "insmod /var/lib/modules/2.6.20/..." as well as copying it into the /var/lib/modules/2.6.21-rc2/... tree and letting udev handle it. Results are the same in both cases. The module loads but does not produce an ethn device (wrt ifconfig -a). "dmesg" reveals the following: forcedeth: no version for "struct_module" found: kernel tainted. This is the first time I've loaded a kernel module across kernel versions so there's always the possibility that I'm doing something wrong. The following is from "modinfo forcedeth": filename: /lib/modules/2.6.21-rc2/kernel/drivers/net/forcedeth.ko license: GPL description: Reverse Engineered nForce ethernet driver author: Manfred Spraul <manfred@colorfullife.com> alias: pci:v000010DEd0000054Fsv*sd*bc*sc*i* alias: pci:v000010DEd0000054Esv*sd*bc*sc*i* alias: pci:v000010DEd0000054Dsv*sd*bc*sc*i* alias: pci:v000010DEd0000054Csv*sd*bc*sc*i* alias: pci:v000010DEd00000453sv*sd*bc*sc*i* alias: pci:v000010DEd00000452sv*sd*bc*sc*i* alias: pci:v000010DEd00000451sv*sd*bc*sc*i* alias: pci:v000010DEd00000450sv*sd*bc*sc*i* alias: pci:v000010DEd000003EFsv*sd*bc*sc*i* alias: pci:v000010DEd000003EEsv*sd*bc*sc*i* alias: pci:v000010DEd000003E6sv*sd*bc*sc*i* alias: pci:v000010DEd000003E5sv*sd*bc*sc*i* alias: pci:v000010DEd00000373sv*sd*bc*sc*i* alias: pci:v000010DEd00000372sv*sd*bc*sc*i* alias: pci:v000010DEd00000269sv*sd*bc*sc*i* alias: pci:v000010DEd00000268sv*sd*bc*sc*i* alias: pci:v000010DEd00000038sv*sd*bc*sc*i* alias: pci:v000010DEd00000037sv*sd*bc*sc*i* alias: pci:v000010DEd00000057sv*sd*bc*sc*i* alias: pci:v000010DEd00000056sv*sd*bc*sc*i* alias: pci:v000010DEd000000DFsv*sd*bc*sc*i* alias: pci:v000010DEd000000E6sv*sd*bc*sc*i* alias: pci:v000010DEd0000008Csv*sd*bc*sc*i* alias: pci:v000010DEd00000086sv*sd*bc*sc*i* alias: pci:v000010DEd000000D6sv*sd*bc*sc*i* alias: pci:v000010DEd00000066sv*sd*bc*sc*i* alias: pci:v000010DEd000001C3sv*sd*bc*sc*i* depends: vermagic: 2.6.20.1 SMP preempt mod_unload K8 parm: max_interrupt_work:forcedeth maximum events handled per interrupt (int) parm: optimization_mode:In throughput mode (0), every tx & rx packet will generate an interrupt. In CPU mode (1), interrupts are controlled by a timer. (int) parm: poll_interval:Interval determines how frequent timer interrupt is generated by [(time_in_micro_secs * 100) / (2^10)]. Min is 0 and Max is 65535. (int) parm: msi:MSI interrupts are enabled by setting to 1 and disabled by setting to 0. (int) parm: msix:MSIX interrupts are enabled by setting to 1 and disabled by setting to 0. (int) parm: dma_64bit:High DMA is enabled by setting to 1 and disabled by setting to 0. (int)
Created attachment 10644 [details] Kernel Oops in 2.6.21-rc3 I downloaded the rc3 patch today and installed it. Basically the same thing happens. I'll stream a video over NFS, it'll play for maybe a minute then oops. No problems however with the 2.6.21-rc* Yukon driver or the 2.6.20 forcedeth one. This oops looks like it didn't get a chance to finish writing to the console.
Created attachment 10704 [details] Fixed calling the interrupt routine based on descriptor version. Please try out this patch and let me know if it fixes this issue.
Albert, could you please try out the new patch? Thanks.
Sorry, I've been busy. Will try to look at the patch today.
Applied the patch against 2.6.21-rc4. So far after ~2 hours streaming via NFS there have been no oopses. However, I am getting a multitude of the following kernel message: eth0: too many iterations (6) in nv_nic_irq.
Great! Those messages are benign. You can bump up the module parameter max_interrupt_work to a value of 10 for example. It just means that there is alot of traffic flowing.
Ok, thanks. In that case feel free to close this one out whenever.