Bug 9047

Summary: (net forcedeth) MCP55 oops under heavy load
Product: Drivers Reporter: Joshua Hoblitt (j_kernel)
Component: NetworkAssignee: Ayaz Abdulla (aabdulla)
Status: CLOSED CODE_FIX    
Severity: high CC: a.girlich, j_kernel, kernel, manfred, price, selecter, viet, yyyeer.bo
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26 Subsystem:
Regression: --- Bisected commit-id:
Attachments: .config
/proc/iomem
/proc/ioports
lspci -vvv
/proc/meminfo
/proc/cpuinfo
forcedeth.c
Add timer lock
New patch for locking
lspci -v -xxx
debug timer callbacks and interrupts
dmesg output from debugging patch
test patch - is skb really NULL?
forcedeth.o
bugfix: pci_enable_msi changes pci_dev->irq
Manfred's patch with trivial changes need to apply to netdev-2.6
s2912 /proc/interupts
s2927 /proc/interupts
2.6.27-rc5 (netdev-2.6) .config

Description Joshua Hoblitt 2007-09-20 21:22:16 UTC
Most recent kernel where this bug did not occur: unknown
Distribution: Gentoo 2007.0
Hardware Environment: amd64/x86_64 - 2 x Opeteron 2220s on Tyan S2927 motherboard
Software Environment: Gentoo Linux, mostly 64-bit userland w/ multilib
Problem Description:The kernel will either Oops from a NULL ptr dereference in forcedeth.c or the network interface will stop but the system will still respond on the serial while logging message about irq abuse into the dmesg.

Steps to reproduce:  We have an in house application which makes heavy use of NFS.  Allowing this program to run for a few minutes will consistently reproduce the issue.  It is almost always an Oops on a single CPU with perhaps a 1 out of 5 chance of seeing the dead network but system still up failure mode.  This issue also occurs with 2.6.22 "gentoo-r3" but the "NULL" value being deferenced is 0x64 instead of 0x7c.  Perhaps the strangest of all, having sysctl parameters set which affect tcp window scaling prevent a full stack trace in the oops message.  Here is the most frequently see Oops message:

Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: 
 [<ffffffff880ad47d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
PGD 219ee5067 PUD 219d9e067 PMD 0 
Oops: 0000 [1] SMP 
CPU 3 
Modules linked in: i2c_core autofs4 smsc37b787_wdt forcedeth rtc_cmos rtc_core rtc_lib aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror
Pid: 0, comm: swapper Not tainted 2.6.23-rc7-git1 #1
RIP: 0010:[<ffffffff880ad47d>]  [<ffffffff880ad47d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
RSP: 0018:ffff8102246b7eb8  EFLAGS: 00010246
RAX: 0000000014000000 RBX: 0000000000000000 RCX: 00000000d8000000
RDX: ffff810128048a80 RSI: 000000014ea5e810 RDI: ffff810123bf1870
RBP: 0000000034020362 R08: 0000000000000000 R09: 0000000000000048
R10: 0000000000000010 R11: ffff810123ba67c0 R12: ffff810121b8c740
R13: 0000000000000362 R14: ffff810121b8c000 R15: 0000000000000001
FS:  00002ba6eb3016d0(0000) GS:ffff81022464f480(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000007c CR3: 0000000219a57000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff810123b6e000, task ffff8102246aa000)
Stack: 

However, removing these values from sysctl.conf allows a full stack trace to occur.

net.core.rmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216 
net.ipv4.tcp_wmem = 4096 87380 16777216
net.core.netdev_max_backlog = 2500

 Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: 
 [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
PGD 0 
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: i2c_core autofs4 smsc37b787_wdt rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror
Pid: 0, comm: swapper Not tainted 2.6.23-rc7-git1 #1
RIP: 0010:[<ffffffff880a047d>]  [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
RSP: 0018:ffff810123b17e60  EFLAGS: 00010246
RAX: 0000000014000000 RBX: 0000000000000000 RCX: 00000000d8000000
RDX: 0000000034020042 RSI: 000000021b135810 RDI: ffff81012340c070
RBP: 0000000034020042 R08: ffff81022239b810 R09: ffff8101215b7000
R10: ffff81021bbf1900 R11: 0000000000000002 R12: ffff8101215b7740
R13: 0000000000000042 R14: ffff8101215b7000 R15: 000000000000001d
FS:  00002b9da0edb6d0(0000) GS:ffff810123f01500(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000007c CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff810224662000, task ffff810123b10000)
Stack:  00000040215b7000 ffff8101215b7740 0000000000000002 ffffc20002760000
 ffff8101215b7000 0000000000000001 0000000000000000 ffffffff880a08a2
 ffff8101215b7740 ffff8101215b7000 00000000000000ff ffffc20002760000
Call Trace:
 <IRQ>  [<ffffffff880a08a2>] :forcedeth:nv_nic_irq_optimized+0x89/0x22c
 [<ffffffff880a13cf>] :forcedeth:nv_do_nic_poll+0x203/0x299
 [<ffffffff880a11cc>] :forcedeth:nv_do_nic_poll+0x0/0x299
 [<ffffffff8023ded0>] run_timer_softirq+0x16a/0x1df
 [<ffffffff8023ad09>] __do_softirq+0x55/0xc4
 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28
 [<ffffffff8020debd>] do_softirq+0x2c/0x7d
 [<ffffffff8021bbff>] smp_apic_timer_interrupt+0x49/0x5e
 [<ffffffff8020ae61>] default_idle+0x0/0x3d
 [<ffffffff8020c826>] apic_timer_interrupt+0x66/0x70
 <EOI>  [<ffffffff8020ae8a>] default_idle+0x29/0x3d
 [<ffffffff8020af5a>] cpu_idle+0xbc/0xdf


Code: 8a 43 7c 83 e0 f3 83 c8 04 88 43 7c 83 7b 6c 00 8b 83 a8 00 
RIP  [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
 RSP <ffff810123b17e60>
CR2: 000000000000007c
Unable to handle kernel NULL pointer dereference<0>Kernel panic - not syncing: Aiee, killing interrupt handler!
 at 000000000000007c RIP: 
 [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
PGD 109aaa067 PUD 109d15067 PMD 0 
Oops: 0000 [2] SMP 
CPU 3 
Modules linked in: i2c_core autofs4 smsc37b787_wdt rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror
Pid: 0, comm: swapper Tainted: G      D 2.6.23-rc7-git1 #1
RIP: 0010:[<ffffffff880a047d>]  [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
RSP: 0018:ffff8102246b3ae8  EFLAGS: 00010246
RAX: 0000000014000000 RBX: 0000000000000000 RCX: 00000000d8000000
RDX: 0000000000000020 RSI: 000000021b135810 RDI: ffff81012340c070
RBP: 0000000034020042 R08: ffff81021fbd8700 R09: ffff81021fb43900
R10: 00000000fc149542 R11: 0000000000000048 R12: ffff8101215b7740
R13: 0000000000000042 R14: ffff8101215b7000 R15: 0000000000000001
FS:  00002b9da0edb6d0(0000) GS:ffff81022464e480(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000007c CR3: 000000011dda7000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff810123b70000, task ffff8102246ac000)
Stack:  00000040215b7740 ffff8101215b7740 0000000000000010 ffffc20002760000
 ffff8101215b7000 0000000000000000 0000000000000503 ffffffff880a08a2
 ffff810224690800 0000000000000000 0000000000000000 00000000000004fc
Call Trace:
 <IRQ>  [<ffffffff880a08a2>] :forcedeth:nv_nic_irq_optimized+0x89/0x22c
 [<ffffffff80259828>] handle_IRQ_event+0x25/0x53
 [<ffffffff8025ae2c>] handle_edge_irq+0xe4/0x127
 [<ffffffff8020e12c>] do_IRQ+0xf1/0x161
 [<ffffffff8020c101>] ret_from_intr+0x0/0xa
 [<ffffffff8027eba8>] kfree+0x79/0x7e
 [<ffffffff804bb042>] kfree_skbmem+0x9/0x6f
 [<ffffffff804e9089>] tcp_ack+0x6c8/0x1914
 [<ffffffff804bb8dc>] __alloc_skb+0x84/0x133
 [<ffffffff804ed317>] tcp_rcv_established+0x7ac/0x8cd
 [<ffffffff804f2a50>] tcp_v4_do_rcv+0x30/0x344
 [<ffffffff8022d572>] find_busiest_group+0x254/0x6d6
 [<ffffffff804f4f35>] tcp_v4_rcv+0x81f/0x88a
 [<ffffffff804da87c>] ip_local_deliver+0xca/0x14c
 [<ffffffff804dadbd>] ip_rcv+0x4bf/0x502
 [<ffffffff804c21b6>] process_backlog+0x84/0x102
 [<ffffffff804c23c0>] net_rx_action+0xa8/0x1ad
 [<ffffffff8023ad09>] __do_softirq+0x55/0xc4
 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28
 [<ffffffff8020debd>] do_softirq+0x2c/0x7d
 [<ffffffff8020e179>] do_IRQ+0x13e/0x161
 [<ffffffff8020ae61>] default_idle+0x0/0x3d
 [<ffffffff8020c101>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff8020ae8a>] default_idle+0x29/0x3d
 [<ffffffff8020af5a>] cpu_idle+0xbc/0xdf


Code: 8a 43 7c 83 e0 f3 83 c8 04 88 43 7c 83 7b 6c 00 8b 83 a8 00 
RIP  [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
 RSP <ffff8102246b3ae8>
CR2: 000000000000007c
WARNING: at drivers/char/vt.c:3566 do_unblank_screen()

Call Trace:
 <IRQ>  [<ffffffff803a7f05>] do_unblank_screen+0x3e/0x120
 [<ffffffff8035ea15>] bust_spinlocks+0x19/0x2c
 [<ffffffff8054b5fc>] oops_end+0x15/0x57
 [<ffffffff8054d025>] do_page_fault+0x6f0/0x7d1
 [<ffffffff804b0db7>] pci_conf1_read+0xbb/0xc6
 [<ffffffff804b0db7>] pci_conf1_read+0xbb/0xc6
 [<ffffffff804b0e79>] pci_conf1_write+0xb7/0xca
 [<ffffffff8054b26d>] error_exit+0x0/0x84
 [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
 [<ffffffff880a03ee>] :forcedeth:nv_rx_process_optimized+0x57/0x36b
 [<ffffffff880a08a2>] :forcedeth:nv_nic_irq_optimized+0x89/0x22c
 [<ffffffff80259828>] handle_IRQ_event+0x25/0x53
 [<ffffffff8025ae2c>] handle_edge_irq+0xe4/0x127
 [<ffffffff8020e12c>] do_IRQ+0xf1/0x161
 [<ffffffff8020c101>] ret_from_intr+0x0/0xa
 [<ffffffff8027eba8>] kfree+0x79/0x7e
 [<ffffffff804bb042>] kfree_skbmem+0x9/0x6f
 [<ffffffff804e9089>] tcp_ack+0x6c8/0x1914
 [<ffffffff804bb8dc>] __alloc_skb+0x84/0x133
 [<ffffffff804ed317>] tcp_rcv_established+0x7ac/0x8cd
 [<ffffffff804f2a50>] tcp_v4_do_rcv+0x30/0x344
 [<ffffffff8022d572>] find_busiest_group+0x254/0x6d6
 [<ffffffff804f4f35>] tcp_v4_rcv+0x81f/0x88a
 [<ffffffff804da87c>] ip_local_deliver+0xca/0x14c
 [<ffffffff804dadbd>] ip_rcv+0x4bf/0x502
 [<ffffffff804c21b6>] process_backlog+0x84/0x102
 [<ffffffff804c23c0>] net_rx_action+0xa8/0x1ad
 [<ffffffff8023ad09>] __do_softirq+0x55/0xc4
 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28
 [<ffffffff8020debd>] do_softirq+0x2c/0x7d
 [<ffffffff8020e179>] do_IRQ+0x13e/0x161
 [<ffffffff8020ae61>] default_idle+0x0/0x3d
 [<ffffffff8020c101>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff8020ae8a>] default_idle+0x29/0x3d
 [<ffffffff8020af5a>] cpu_idle+0xbc/0xdf

And here are the messages that appear in the dmesg when the network dies without causing an oops (we've only seen this twice):

eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.
eth0: too many iterations (6) in nv_nic_irq.

I also should add that it is only very heavy network traffic that causes this issue as this system has been in use for months without noticing any problems until we started trying to run this particular applicationt.  It may also be worth noting that the application is doing a far amount of access to a volume that is exported from the system via NFS that is then mounted locally (by the automounter).

Note: all of the attached files were taken from an identical system as the exact system is "off site" to me and I can't access it until tomorrow.  While this system is identical it is possible that the motherboard has a slightly different hw ver or bios on it.
Comment 1 Joshua Hoblitt 2007-09-20 21:22:57 UTC
Created attachment 12891 [details]
.config
Comment 2 Joshua Hoblitt 2007-09-20 21:23:18 UTC
Created attachment 12892 [details]
/proc/iomem
Comment 3 Joshua Hoblitt 2007-09-20 21:23:41 UTC
Created attachment 12893 [details]
/proc/ioports
Comment 4 Joshua Hoblitt 2007-09-20 21:23:56 UTC
Created attachment 12894 [details]
lspci -vvv
Comment 5 Joshua Hoblitt 2007-09-20 21:24:09 UTC
Created attachment 12895 [details]
/proc/meminfo
Comment 6 Joshua Hoblitt 2007-09-20 21:24:28 UTC
Created attachment 12896 [details]
/proc/cpuinfo
Comment 7 Ayaz Abdulla 2007-09-20 23:34:45 UTC
1) Can you try with msi=0 msix=0 module parameters when you load the driver?
2) Can you try with max_interrupt_work module parameter set to a really high value? ie. 500
Comment 8 Joshua Hoblitt 2007-09-21 17:23:42 UTC
I've tested both #1 and #2 per your request.  The executive summary is that #1,
#2, and #1 & #2 together all prevent the oops.  My naive impression is that #2
is the better fix as it leaves MSI enabled (I assume that this is superior to
INTx emulation) while preventing the oops and since these are 4-way SMP
systems the extra latency in the driver shouldn't be a killer.

Is there a means of highwater marking the number of events that are queued up?

Why does the event queue filling up cause an oops?

# modprobe forcedeth msi=0 msix=0

ACPI: PCI interrupt for device 0000:00:09.0 disabled
ACPI: PCI interrupt for device 0000:00:08.0 disabled
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:08.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21
PCI: Setting latency timer of device 0000:00:09.0 to 64
forcedeth: using HIGHDMA
eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0
eth0: no IPv6 routers present
eth0: no IPv6 routers present

running with heavy nfs load for ~30mins

dmesg | grep "eth0: too many iterations (6) in nv_nic_irq." | wc -l
700


# modprobe forcedeth msi=0 msix=0 max_interrupt_work=500

ACPI: PCI interrupt for device 0000:00:09.0 disabled
ACPI: PCI interrupt for device 0000:00:08.0 disabled
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:08.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21
PCI: Setting latency timer of device 0000:00:09.0 to 64
forcedeth: using HIGHDMA
eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0
eth0: no IPv6 routers present
eth0: no IPv6 routers present

running with heavy nfs load for ~30mins

no new dmesg entries

# modprobe forcedeth max_interrupt_work=500

ACPI: PCI interrupt for device 0000:00:09.0 disabled
ACPI: PCI interrupt for device 0000:00:08.0 disabled
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:08.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21
PCI: Setting latency timer of device 0000:00:09.0 to 64
forcedeth: using HIGHDMA
eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0
eth0: no IPv6 routers present

The "ACPI: PCI interrupt for device 0000:00:09.0 disabled" lines made me
suspect that the msi disabled flags were sticking around in the hardware so I
set "options forcedeth max_interrupt_work=500" and rebooted to the system to
get the hardware back to it's default state.  The boot dmesg ommited the "disabled" lines.

forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 22 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:08.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0
ACPI: PCI Interrupt Link [LMAD] enabled at IRQ 21
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:09.0 to 64
forcedeth: using HIGHDMA
eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0

This is configuration seems to be stable under heavy NFS load with no unusual dmesg entries.

Note that all testing was with the following sysctl explicitly set:

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.core.rmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216
net.core.netdev_max_backlog = 2500
Comment 9 Ayaz Abdulla 2007-09-24 10:59:44 UTC
Can you verify in original setup whether MSI or MSIX was setup? You can do "cat /proc/interrupts" to verify this. It will say MSI or MSIX or regular interrupts.

I have a hunch on why the network dies without the oops. I still need to investigate the "oops".
Comment 10 Joshua Hoblitt 2007-09-24 14:44:08 UTC
(this is a copy of an email I sent directly to Ayaz this morning while bugzilla was down)

The kernel.org bugzilla went down as I was responding.  Here is the
/proc/interrupts (I thought I'd sent that in with the original bug report
but I guess not).

# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:       1265      37207      65587   72857808   IO-APIC-edge      timer
  1:          0          0          0          8   IO-APIC-edge      i8042
  4:       1474       1304       3994     171125   IO-APIC-edge      serial
  6:          0          0          0          2   IO-APIC-edge      floppy
  8:          0          0          0          1   IO-APIC-edge      rtc
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 14:          0          0          0         59   IO-APIC-edge      ide0
 15:          0          0          0          0   IO-APIC-edge      libata
 19:     182100     373567      72387    3712628   IO-APIC-fasteoi   arcmsr
 20:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
 21:          0          0          0          0   IO-APIC-fasteoi   sata_nv
 22:          0          0          0          0   IO-APIC-fasteoi   sata_nv
 23:          0          0          0          0   IO-APIC-fasteoi   sata_nv, ohci_hcd:usb2
1276:      33167   62789742     665173  235921228   PCI-MSI-edge      eth0
NMI:          0          0          0          0 
LOC:   72961951   72961997   72961953   72961895 
ERR:          0

If the int distribution looks strange it's because I started up the irqbalance
daemon at the end of business on Friday thinking that the system was stable.
The system is however not stable.  As of this morning the kernel was
responding to icmp (echo requests) but other then that you can not open sockets
to/from the system.  The serial console is still responding but the system is
very unhappy because of hung NFS mounts.  I guess the upside is that system ran
for decent period with a heavy network load before failing:

eth0      Link encap:Ethernet  HWaddr 00:E0:81:76:5F:52  
          inet addr:128.171.227.106  Bcast:128.171.227.255  Mask:255.255.255.0
          inet6 addr: fe80::2e0:81ff:fe76:5f52/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:199898797 errors:434 dropped:0 overruns:0 frame:434
          TX packets:160862503 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:87001054754 (82970.6 Mb)  TX bytes:338307355734 (322635.0 Mb)
          Interrupt:22 

There was nothing of interest in the dmesg except nfs timeouts.  rmmod forcedeth; modprobe forcedeth brought the network right back up.

-J
Comment 11 Ayaz Abdulla 2007-09-25 13:13:58 UTC
Can you send me your forcedeth.c file so that I can create a patch for you to try?
Comment 12 Joshua Hoblitt 2007-09-25 13:56:55 UTC
Created attachment 12942 [details]
forcedeth.c
Comment 13 Ayaz Abdulla 2007-09-25 22:42:19 UTC
Created attachment 12949 [details]
Add timer lock

Here is a potential fix for the Oops. Can you give it a try?
Comment 14 Joshua Hoblitt 2007-09-26 16:46:23 UTC
I patched the same 2.6.23-rc7-git1 kernel with the patch, rebuilt, removed all options passed to the forcedeth driver and rebooted for good measure to make sure the hardware was in a clean state.  It ran for ~10mins under heavy load before giving up with another oops.  Unfortunately it looks like the serial console went dead before we got the entire backtrace which is pretty consistent with the old oops behavior.

Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: 
 [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
PGD 11f01d067 PUD 122138067 PMD 0 
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: i2c_core autofs4 smsc37b787_wdt rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror
Pid: 0, comm: swapper Not tainted 2.6.23-rc7-git1 #2
RIP: 0010:[<ffffffff880a047d>]  [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
RSP: 0018:ffff810224667e60  EFLAGS: 00010046
RAX: 0000000014000000 RBX: 0000000000000000 RCX: 00000000d8000000
RDX: ffff810096e78f00 RSI: 00000000b4f64010 RDI: ffff810224076070
RBP: 0000000034020042 R08: 0000000000000000 R09: ffff810096e78f00
R10: 0000000000000001 R11: 0000000000000002 R12: ffff8101226ca740
R13: 0000000000000042 R14: ffff8101226ca000 R15: 0000000000000001
FS:  00002b9bd0b85380(0000) GS:ffff810123f01500(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000007c CR3: 000000011f036000 CR4: 00000000000006e0
Comment 15 Joshua Hoblitt 2007-09-26 16:49:38 UTC
Also worth noting, we've replicated this oops on a completely different system with the same Tyan S2927 motherboard so I think we've pretty much ruled hardware problems (other than inherent design flaw) out here.
Comment 16 Joshua Hoblitt 2007-09-26 20:45:20 UTC
Per request, I changed the modules revision number to prove that the patched version is being loaded.

forcedeth.c: Reverse Engineered nForce ethernet driver. Version 3.1415.

We got the full backtrace on panic this time thou.

 Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: 
 [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
PGD 21f1a2067 PUD 21f1b9067 PMD 0 
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: i2c_core autofs4 smsc37b787_wdt rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror
Pid: 7813, comm: nfsd Not tainted 2.6.23-rc7-git1 #2
RIP: 0010:[<ffffffff880a047d>]  [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
RSP: 0018:ffff810224667eb8  EFLAGS: 00010246
RAX: 0000000014000000 RBX: 0000000000000000 RCX: 00000000d8000000
RDX: ffffffff8027df65 RSI: 0000000173737810 RDI: ffff810224700070
RBP: 00000000340205ea R08: ffff81022047dd28 R09: ffffffff802c7580
R10: ffff810121b471a0 R11: 00000000000011b1 R12: ffff8101224c7740
R13: 00000000000005ea R14: ffff8101224c7000 R15: 0000000000000001
FS:  00002b97dd5a36d0(0000) GS:ffff810123f01500(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000007c CR3: 000000021f7e7000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 7813, threadinfo ffff81022047c000, task ffff81022097f740)
Stack:  00000040224c7740 ffff8101224c7740 0000000000000010 ffffc20002760000
 ffff8101224c7000 0000000000000000 0000000000002000 ffffffff880a08a2
 ffff810122056740 0000000000000000 0000000000000000 00000000000004fc
Call Trace:
 <IRQ>  [<ffffffff880a08a2>] :forcedeth:nv_nic_irq_optimized+0x89/0x22c
 [<ffffffff80259828>] handle_IRQ_event+0x25/0x53
 [<ffffffff8025ae2c>] handle_edge_irq+0xe4/0x127
 [<ffffffff8020e12c>] do_IRQ+0xf1/0x161
 [<ffffffff802c7580>] ext3_file_write+0x0/0x94
 [<ffffffff8020c101>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff802c7580>] ext3_file_write+0x0/0x94
 [<ffffffff802834db>] do_sync_readv_writev+0xb7/0x107
 [<ffffffff80294959>] iput+0x42/0x7b
 [<ffffffff80247a06>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8053ff5f>] cache_check+0x64/0x433
 [<ffffffff803210f1>] exp_find_key+0x96/0xa9
 [<ffffffff8022c71f>] enqueue_task+0x3c/0x4f
 [<ffffffff80283379>] rw_copy_check_uvector+0x6c/0xdc
 [<ffffffff80283b2d>] do_readv_writev+0xb2/0x18b
 [<ffffffff8031c105>] nfsd_acceptable+0x0/0xca
 [<ffffffff8031cfd5>] nfsd_permission+0x23/0xf1
 [<ffffffff8031d60a>] nfsd_vfs_write+0xe9/0x2dd
 [<ffffffff80282170>] __dentry_open+0x101/0x1aa
 [<ffffffff8031dc9a>] nfsd_open+0x13c/0x170
 [<ffffffff8031ded8>] nfsd_write+0xc5/0xe2
 [<ffffffff803253df>] nfsd3_proc_write+0xfb/0x11a
 [<ffffffff80319a46>] nfsd_dispatch+0xde/0x1e5
 [<ffffffff80539752>] svc_process+0x3df/0x6ed
 [<ffffffff8054a89b>] __down_read+0x12/0x9a
 [<ffffffff8031a060>] nfsd+0x1a1/0x2b9
 [<ffffffff8020ca08>] child_rip+0xa/0x12
 [<ffffffff80319ebf>] nfsd+0x0/0x2b9
 [<ffffffff8020c9fe>] child_rip+0x0/0x12


Code: 8a 43 7c 83 e0 f3 83 c8 04 88 43 7c 83 7b 6c 00 8b 83 a8 00 
RIP  [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
 RSP <ffff810224667eb8>
CR2: 000000000000007c
Kernel panic - not syncing: Aiee, killing interrupt handler!
Comment 17 Ayaz Abdulla 2007-09-27 09:19:55 UTC
Created attachment 12961 [details]
New patch for locking

Can you try out this new patch? It protects all access to rx_process and alloc_rx for non-NAPI.
Comment 18 Joshua Hoblitt 2007-09-27 15:57:42 UTC
This patch appears to be a step in the right direction.  I applied the patch to clean -rc7-git1 sources, installed the modules and rebooted.  I again changed the version number in the module to prove that the correct driver is being loaded on boot:

forcedeth.c: Reverse Engineered nForce ethernet driver. Version 6.283.
ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 22
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:08.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0
ACPI: PCI Interrupt Link [LMAD] enabled at IRQ 21
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21
PCI: Setting latency timer of device 0000:00:09.0 to 64
forcedeth: using HIGHDMA
eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0

The system seemed to fair pretty well this time with the familiar messages showing up in the dmesg under heavy load.

dmesg | grep "eth0: too many iterations (6) in nv_nic_irq." | wc -l
1280

The "standard" testing load we were using before wasn't enough to push the machine over.  We had to up the NFS client load from ~900mbit/s to ~1gbit/s before the network interface died.  It appears that the oops is gone thou.  We are going to run this test a few more times.
Comment 19 Ayaz Abdulla 2007-09-27 16:02:55 UTC
When the network interface dies, can you see if the interrupt count is incrementing from 'cat /proc/interrupts'. Can you also verify if either the tx or rx counters are incrementing from 'ifconfig'. Also, please send output of "ethtool -d ethX"

Thanks.
Comment 20 Joshua Hoblitt 2007-09-27 17:42:38 UTC
Will do.  We seem to be having trouble getting it to fall over again - this is be far the longest it's ever made. 

dmesg | grep "eth0: too many iterations (6) in nv_nic_irq." | wc -l
4482
Comment 21 Joshua Hoblitt 2007-09-27 17:45:31 UTC
Of course, within seconds of my last posting that the system hard locked.  We didn't even get the oops on the serial console.  It just completely froze.
Comment 22 Joshua Hoblitt 2007-09-27 19:01:00 UTC
Just out of curiosity, is it worth trying CONFIG_FORCEDETH_NAPI since in theory this will reduce the number of hardware interupts?
Comment 23 Ayaz Abdulla 2007-09-28 14:04:48 UTC
You could give it a try as a data point, but only difference is that rx processing  is done through a napi handler instead within the IRQ handler.

Also, could you send me the verbose output of all pci config data at any time the system has the driver loaded? i.e. lspci -v -xxx
Comment 24 Joshua Hoblitt 2007-09-28 14:46:29 UTC
Created attachment 12984 [details]
lspci -v -xxx
Comment 25 Joshua Hoblitt 2007-09-28 17:47:59 UTC
Welp, It crashes with NAPI enabled as well.  The last few crashes w/o NAPI have been deadlocks without even an oops. How important is it to look at the system while it's in the dead NIC but the system is still failure mode?  It seems to be a random scatter of failure modes between; Oops, partial oops, deadlock, and dead NIC.
Comment 26 Joshua Hoblitt 2007-10-02 17:46:57 UTC
In an attempt to get these systems into a usable state we tried setting max_interrupt_work to an insanely high value.  The issue still persists at 5k and it looks like 50k is high enough to cause problems of it own (but we are testing this setting for stability right now)... 

options forcedeth msi=0 msix=0 max_interrupt_work=50000 

WARNING: at kernel/softirq.c:139 local_bh_enable()

Call Trace:
 <IRQ>  [<ffffffff8023abd0>] local_bh_enable+0x3a/0x91
 [<ffffffff804b99a0>] sock_queue_rcv_skb+0xed/0x105
 [<ffffffff804f868a>] udp_queue_rcv_skb+0x130/0x1f4
 [<ffffffff804f8c2d>] __udp4_lib_rcv+0x4df/0x749
 [<ffffffff804da87c>] ip_local_deliver+0xca/0x14c
 [<ffffffff804dadbd>] ip_rcv+0x4bf/0x502
 [<ffffffff8022c381>] __wake_up_common+0x3e/0x68
 [<ffffffff880a1f84>] :forcedeth:nv_napi_poll+0x567/0x729
 [<ffffffff804c23c0>] net_rx_action+0xa8/0x1ad
 [<ffffffff8809fe70>] :forcedeth:nv_nic_irq_optimized+0xa5/0x237
 [<ffffffff8023ad09>] __do_softirq+0x55/0xc4
 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28
 [<ffffffff8020debd>] do_softirq+0x2c/0x7d
 [<ffffffff8020e179>] do_IRQ+0x13e/0x161
 [<ffffffff8020ae61>] default_idle+0x0/0x3d
 [<ffffffff8020c101>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff8020ae8a>] default_idle+0x29/0x3d
 [<ffffffff8020af5a>] cpu_idle+0xbc/0xdf
Comment 27 Manfred Spraul 2007-10-03 04:06:51 UTC
Created attachment 13029 [details]
debug timer callbacks and interrupts

Hi,

I'm trying to understand why the spinlock fixed the problem. Therefore I've written a debug patch that searches for a race.
Could you apply the attached patch and run it, without any special options (i.e.: try to provoke a crash).
The patch is against the forcedeth.c file that you posted.

Thanks,
  Manfred
Comment 28 Joshua Hoblitt 2007-10-03 12:07:16 UTC
I think you mean "why the spinlock didn't fix the problem"?  I will try this patch and post the results.
Comment 29 Joshua Hoblitt 2007-10-07 14:18:03 UTC
Jeff,

Is there anything in git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git fe-lock

that might help resolve this issue?
Comment 30 Jeff Garzik 2007-10-07 15:11:15 UTC
It is definitely worth testing, yes.  As the branch name implies, the locking is definitely cleaned up (though disable_irq stuff is still there).
Comment 31 Joshua Hoblitt 2007-10-07 17:07:11 UTC
We'll give the fe-lock a try on Monday.  As a side note, we've managed to recreate this issue on Tyan S2912 boards as well so it's not something specific to the MCP firmware that ships with S2927s.
Comment 32 Manfred Spraul 2007-10-08 11:56:48 UTC
Hi Joshua,

two questions:
- do you use net-console? The local_irq handling with poll_controller in both the normal driver and fe-lock driver look suspicious.
- Did you try my test patch? I'm just noisy if there is really a race between disable_irq() and the interrupt handler. If this is true, then the MSI interrupt handler is broken.

thanks,
  Manfred
Comment 33 Joshua Hoblitt 2007-10-08 13:51:18 UTC
Hi Manfred,

To answer your questions:

- We are not using net-console.  The literal kernel cmdline for consoles is "console=tty0 console=ttyS0,115200n8"

- I booted a machine with your patch last Friday but it hasn't fallen over yet.  Hopefully we'll be able to make this happen today.

Cheers
Comment 34 Joshua Hoblitt 2007-10-08 15:18:44 UTC
Hi Manfred,

We've managed to crash a machine twice running your patched up kernel and both times the system has just dead locked without giving an oops on the console.  How important is it to get an oops with that patch?

Cheers
Comment 35 Joshua Hoblitt 2007-10-08 18:17:16 UTC
I pulled from the netdev-2.6 fe-lock branch, copied in my .config, and booted the system on "2.6.23-rc9-g1db49136-dirty".  The system still crashed but it, non emphatically, seems like it took a bit longer than usual to go down.  For the 3rd time in a row we didn't get the oops on the serial console (i'm guessing just bad luck) but I did see this message appear in the dmesg after we put a load on the system.

WARNING: at net/core/dev.c:2154 net_rx_action()

Call Trace:
 <IRQ>  [<ffffffff804c25c9>] net_rx_action+0xc9/0x190
 [<ffffffff8023ac6d>] __do_softirq+0x55/0xc4
 [<ffffffff8020ccfc>] call_softirq+0x1c/0x28
 [<ffffffff8020de3d>] do_softirq+0x2c/0x7d
 [<ffffffff8020e0f9>] do_IRQ+0x13e/0x161
 [<ffffffff8020ae11>] default_idle+0x0/0x3d
 [<ffffffff8020c081>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff8020ae3a>] default_idle+0x29/0x3d
 [<ffffffff8020aed9>] cpu_idle+0x8b/0xae
Comment 36 Joshua Hoblitt 2007-10-09 17:21:02 UTC
Hi Manfred,

We're giving your debugging patch another try - we're trying to crash it now.  Here is what it prints into the dmesg right after being loaded.

test 1: ok
test 1: ok done, test 2 start
forcedeth: lock error. Previous owner ffffffff8809ff8f (0), now ffffffff8809ff9b (0)
nv_open+0x5f/0x53a [forcedeth]
nv_open+0x6b/0x53a [forcedeth]
test 2: fail done, test 3 start
forcedeth: lock not held ffffffff8809ffc1 (0)
nv_open+0x6b/0x53a [forcedeth]
test 3: fail done
Comment 37 Joshua Hoblitt 2007-10-11 15:43:27 UTC
Manfred,

Here is the oops message from a system while running your debugging patch.  

Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: 
 [<ffffffff880a059e>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
eth0: too many iterations (6) in nv_nic_irq.
PGD 2275e7067 PUD 223784067 PMD 0 
Oops: 0000 [1] SMP 
CPU 2 
Modules linked in: autofs4 rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror
Pid: 0, comm: swapper Not tainted 2.6.23-rc7-git1 #3
RIP: 0010:[<ffffffff880a059e>]  [<ffffffff880a059e>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
RSP: 0018:ffff810229bb3e60  EFLAGS: 00010246
RAX: 0000000014000000 RBX: 0000000000000000 RCX: 0000000004000000
RDX: ffff810227bc7a00 RSI: 000000021f75b810 RDI: ffff8102294f3870
RBP: 0000000034020202 R08: 0000000000000004 R09: ffff810227bc7a00
R10: 0000000000000001 R11: 0000000000000002 R12: ffff810223632740
R13: 0000000000000202 R14: ffff810223632000 R15: 0000000000000001
FS:  00002b13d1deaae0(0000) GS:ffff81022a501600(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000007c CR3: 0000000216d2c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff810229bac000, task ffff810229baa000)
Stack:  0000004025224200 ffff810223632740 0000000000000003 ffffc20000c56000
 ffff810223632000 0000000000000001 0000000000000000 ffffffff880a09c3
 ffff810223632740 ffff810223632000 00000000000000ff ffffc20000c56000
Call Trace:
 <IRQ>  [<ffffffff880a09c3>] :forcedeth:nv_nic_irq_optimized+0x89/0x22c
 [<ffffffff880a1534>] :forcedeth:nv_do_nic_poll+0x20f/0x2b1
 [<ffffffff880a1325>] :forcedeth:nv_do_nic_poll+0x0/0x2b1
 [<ffffffff8023ded0>] run_timer_softirq+0x16a/0x1df
 [<ffffffff8023ad09>] __do_softirq+0x55/0xc4
 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28
 [<ffffffff8020debd>] do_softirq+0x2c/0x7d
 [<ffffffff8021bbff>] smp_apic_timer_interrupt+0x49/0x5e
 [<ffffffff8020ae61>] default_idle+0x0/0x3d
 [<ffffffff8020c826>] apic_timer_interrupt+0x66/0x70
 <EOI>  [<ffffffff8020ae8a>] default_idle+0x29/0x3d
 [<ffffffff8020af5a>] cpu_idle+0xbc/0xdf


Code: 8a 43 7c 83 e0 f3 83 c8 04 88 43 7c 83 7b 6c 00 8b 83 a4 00 
RIP  [<ffffffff880a059e>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
 RSP <ffff810229bb3e60>
CR2: 000000000000007c
Kernel panic - not syncing: Aiee, killing interrupt handler!
Comment 38 Joshua Hoblitt 2007-10-11 15:55:03 UTC
I don't know if this is related to the debugging patch or not but on Tyan S2912 boards occasionally the nic will fail to initialize.

From the dmeseg:
ADDRCONF(NETDEV_UP): eth0: link is not ready

ethtool eth0
Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: Unknown! (65535)
        Duplex: Unknown! (255)
        Port: MII
        PHYAD: 1
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Link detected: no

Rebooting the systems brings the interfaces back
Comment 39 Joshua Hoblitt 2007-10-11 17:30:53 UTC
Created attachment 13121 [details]
dmesg output from debugging patch
Comment 40 Joshua Hoblitt 2007-10-11 17:35:57 UTC
Hi Manfred,

It looks like we did actually capture the dmesg into the logs before the system went down this time.

Please let us know what the path forward is from here.  This is starting to become an urgent issue.

Also, I've discovered that it's much easier to trigger the oops by reducing the MTU.  I image that this is just because the driver has to service more interupts.  An MTU of 500 seems to be fairly effective.
Comment 41 Manfred Spraul 2007-10-12 10:56:27 UTC
Could you send me forcedeth.o?
I'll try to disassemble the code. All crashes are in the same line, always offset 0x007c. Perhaps it's a simple bug and not a race?
Especially:
>>>
Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: 
 [<ffffffff880a059e>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b
eth0: too many iterations (6) in nv_nic_irq.
>>>
First the oops, then the message about too many iterations. Very odd.
Comment 42 Manfred Spraul 2007-10-12 13:36:13 UTC
Created attachment 13138 [details]
test patch - is skb really NULL?

I've compiled forcedeth.o on my x86-64 system:
It seems there is only one reference to 0x007c in nv_rx_process_optimized:
The skb->ip_summed bitfield. Thus skb seems to be NULL.
Thus np->get_rx_ctx->skb is NULL at the beginning of the packet rx loop.

Joshua, you wrote that you can reproduce the bug fairly good:
Could you apply the attached patch? I just want to confirm my suspicion.

Thanks
Comment 43 Ayaz Abdulla 2007-10-12 13:53:35 UTC
My suspicion was that nv_alloc_rx_optimized was called simultaneously by 2 different callers. That could cause the np->put_rx to be incremented twice and therefore cause an empty rx skb slot.

Thats why I created the patch "New patch for locking" with spin_locks around the allocation routine. But Joshua stated that with that patch the system hit a hard lock...
Comment 44 Paul Price 2007-10-12 15:50:40 UTC
Created attachment 13139 [details]
forcedeth.o

Hi Manfred.

Josh appears to be out sick today, so I'm attaching forcedeth.o for you.  Dunno if you still need it, but trying to help.

-rw-r--r-- 1 root root 58616 Oct  9 13:32 /usr/src/linux/drivers/net/forcedeth.o

Paul.
Comment 45 Joshua Hoblitt 2007-10-12 17:46:21 UTC
Hi Manfred,

I'm actually in the office today.  I applied the skb null test patch on top of your debbuging patch and it looks like we hit pay dirt!  The dmesg is full of errors and every once in awhile the "skb = NULL" printf shows up. Paul tells me that ping is showing packet loss to this system with this patch installed so we are indeed hitting this case. 

Any idea as to why Ayaz's patch didn't fix this before?

nv_do_nic_poll+0xb5/0x2b1 [forcedeth]
nv_nic_irq_optimized_wrap+0x18/0x37 [forcedeth]
rx_process_optimized with np->get_rx_ctx->skb = NULL
np ffff810227854740 np->get_rx.ex ffff8102262c2730 np->put_rx.ex ffff8102262c2540 rx_processed_cnt 1
forcedeth: lock not held ffffffff880a0bd4 (3)
nv_nic_irq_optimized_wrap+0x18/0x37 [forcedeth]
eth0: too many iterations (6) in nv_nic_irq.
forcedeth: lock error. Previous owner ffffffff880a1418 (3), now ffffffff880a0bbc (3)
Comment 46 Joshua Hoblitt 2007-10-12 18:36:10 UTC
Hi Manfred,

It looks like there is another NULL pointer dereference that appears once the first one is papered over.

 Unable to handle kernel NULL pointer dereference at 0000000000000210 RIP: 
 [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248
PGD 21f906067 PUD 2272d0067 PMD 0 
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: autofs4 rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror
Pid: 0, comm: swapper Not tainted 2.6.23-rc7-git1 #3
RIP: 0010:[<ffffffff804bfa6b>]  [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248
RSP: 0018:ffff810229b5be90  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff81008499c600 RCX: 0000000000000006
RDX: ffff81000103ea88 RSI: ffff81022c211a20 RDI: ffff81008499c600
RBP: ffff81000103eac0 R08: ffff81022a501342 R09: ffff8101795ffd38
R10: 000000000000fc00 R11: ffff8101795ffd38 R12: 0000000000000001
R13: ffff810229b5bf0c R14: ffff81000103ea80 R15: 000000010004cfb6
FS:  00002b7502c5eae0(0000) GS:ffff81022a501400(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000210 CR3: 0000000227312000 CR4: 00000000000006e0
DR0: 000000000  �`
                     0000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff810229b56000, task ffff810229b54000)
Stack:  0000000000000000 ffff81000103eac0 0000000000000001 ffff810229b5bf0c
 ffff81000103ea80 ffffffff804c1d4d 0000004029605400 ffff81000103eac0
 0000000000000000 ffff81000103eaa0 ffff81000103ea80 000000010004cfb6
Call Trace:
 <IRQ>  [<ffffffff804c1d4d>] process_backlog+0x84/0x102
 [<ffffffff804c1f57>] net_rx_action+0xa8/0x1ad
 [<ffffffff8023ad09>] __do_softirq+0x55/0xc4
 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28
 [<ffffffff8020debd>] do_softirq+0x2c/0x7d
 [<ffffffff8021bbff>] smp_apic_timer_interrupt+0x49/0x5e
 [<ffffffff8020ae61>] default_idle+0x0/0x3d
 [<ffffffff8020c826>] apic_timer_interrupt+0x66/0x70
 <EOI>  [<ffffffff8020ae8a>] default_idle+0x29/0x3d
 [<ffffffff8020af5a>] cpu_idle+0xbc/0xdf


Code: 48 83 b8 10 02 00 00 00 74 49 48 8b a8 60 04 00 00 48 85 ed 
RIP  [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248
 RSP <ffff810229b5be90>
CR2: 0000000000000210
Unable to handle kernel NULL pointer dereference<0>Kernel panic - not syncing: Aiee, killing interrupt handler!
 at 0000000000000210 RIP: 
 [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248
PGD 223930067 PUD 2263be067 PMD 0 
Oops: 0000 [2] SMP 
CPU 3 
Modules link400
Process syslog-ng (pid: 7097, threadinfo ffff810223aac000, task ffff8102275c7740)
Stack:  0000000000000000 ffff810001050ac0 0000000000000010 ffff810229be7efc
 ffff810001050a80 ffffffff804c1d4d 0000004000000287 ffff810001050ac0
 0000000000000000 ffff810001050aa0 ffff810001050a80 000000010004cfb6
Call Trace:
 <IRQ>  [<ffffffff804c1d4d>] process_backlog+0x84/0x102
 [<ffffffff804c1f57>] net_rx_action+0xa8/0x1ad
 [<ffffffff8023ad09>] __do_softirq+0x55/0xc4
 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28
 [<ffffffff8020debd>] do_softirq+0x2c/0x7d
 [<ffffffff8020e179>] do_IRQ+0x13e/0x161
 [<ffffffff8020c101>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff8028f0c5>] do_sys_poll+0x193/0x360
 [<ffffffff8028f0c2>] do_sys_poll+0x190/0x360
 [<ffffffff8028fbaf>] __pollwait+0x0/0xe1
 [<fffffff1c2
 [<ffffffff8028b601>] getname+0x14c/0x191
 [<ffffffff80358881>] _atomic_dec_and_lock+0x39/0x58
 [<ffffffff80286540>] sys_newstat+0x28/0x31
 [<ffffffff8028f2c4>] sys_poll+0x32/0x3b
 [<ffffffff8020bbee>] system_call+0x7e/0x83


Code: 48 83 b8 10 02 00 00 00 74 49 48 8b a8 60 04 00 00 48 85 ed 
RIP  [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248
 RSP <ffff810229be7e80>
CR2: 0000000000000210
WARNING: at drivers/char/vt.c:3566 do_unblank_screen()

Call Trace:
 <IRQ>  [<ffffffff803a7cb5>] do_unblank_screen+0x3e/0x120
 [<ffffffff8035ea15>] bust_spinlocks+0x19/0x2c
 [<ffffffff80545b9c>] oops_end+0x15/0x57
 [<ffffffff805475c5>] do_page_fault+0x6f0/0x7d1
 [<ffffffff804e7c7e>] tcp_rcv_established+0x117/0x8cd
 [<ffffffff8054580d>] error_exit+0x0/0x84
 [<ffffffff804bfa6b>] ne}%IE
Comment 47 Manfred Spraul 2007-10-13 02:06:59 UTC
(In reply to comment #45)

> Any idea as to why Ayaz's patch didn't fix this before?
> 
> nv_do_nic_poll+0xb5/0x2b1 [forcedeth]
> nv_nic_irq_optimized_wrap+0x18/0x37 [forcedeth]
> rx_process_optimized with np->get_rx_ctx->skb = NULL
> np ffff810227854740 np->get_rx.ex ffff8102262c2730 np->put_rx.ex
> ffff8102262c2540 rx_processed_cnt 1
> forcedeth: lock not held ffffffff880a0bd4 (3)
> nv_nic_irq_optimized_wrap+0x18/0x37 [forcedeth]
> eth0: too many iterations (6) in nv_nic_irq.
> forcedeth: lock error. Previous owner ffffffff880a1418 (3), now
> ffffffff880a0bbc (3)
> 
Could you send me a larger block from the log? 20 lines (preferably the first) skb=NULL message and 20 lines after the message.

Thanks,
  Manfred
Comment 48 Manfred Spraul 2007-10-13 02:39:41 UTC
Created attachment 13141 [details]
bugfix: pci_enable_msi changes pci_dev->irq

Hi Joshua,

I think I found the bug:
pci_enable_msi() changes pci_dev->irq.
Half of the driver used the new value, half used a stale copy in dev->irq.

Could you try the attached patch? It updates the copy in dev->irq [for users outside forcedeth.c] and replaces all accesses in forcedeth.c with direct acccesses of pci_dev->irq.
Comment 49 Joshua Hoblitt 2007-10-15 16:16:04 UTC
Hi Manfred,

We tested your latest patch ontop of 2.6.23-rc7-git1 + your two other debugging patches and after an hour of testing it appears that the oops is fixed! We're about to try some more extensive testing with this patch against 2.6.23 (the patch applies cleanly) with CONFIG_FORCEDETH_NAPI set as this is how we'd like to run the driver in production. I'll report back the results one way or the other.
Comment 50 Joshua Hoblitt 2007-10-16 13:31:16 UTC
Hi Manfred,

Thus far it appears that this issue is resolved.  I'm going to be rolling out this patch to to 7 more machines today.  If all goes well I'll ready to stick a "Tested-by:" on this patch by the end of the today.  I would also really like to see it submitted for 2.6.23.y.  Do you feel that it is in good enough shape for the stable series or should I submit it to be part of my distribution (gentoo) kernel patch set for awhile?
Comment 51 Joshua Hoblitt 2007-10-16 16:52:55 UTC
It looks like there may still be a problem.  We got another crash but without an oops on the console.  We are attempting to recreate the failure.
Comment 52 Joshua Hoblitt 2007-10-17 14:16:49 UTC
We have two different models of motherboard with the MCP55. Tyan S2912s and S2927s.  We started debugging this issue on a S2927 (most of the bug reports) but eventually switched to a S2912 with an IPMI card installed as the original system is not on remote managed power and power cycling it by hand was becoming tedious.  

It seems that the way IRQs are setup on these motherboards is slightly different. Here is the dmesg from the S2912 system:

forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 18
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 18 (level, high) -> IR
Q 18
PCI: Setting latency timer of device 0000:00:08.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 010f1:2912 bound to 0000:00:08.0
ACPI: PCI Interrupt Link [LMA2] enabled at IRQ 17
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMA2] -> GSI 17 (level, high) -> IR
Q 17
PCI: Setting latency timer of device 0000:00:09.0 to 64
forcedeth: using HIGHDMA
eth1: forcedeth.c: subsystem: 010f1:2912 bound to 0000:00:09.0

And here is the dmesg from the S2927 system:

forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 22
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ
 22
PCI: Setting latency timer of device 0000:00:08.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0
ACPI: PCI Interrupt Link [LMAD] enabled at IRQ 21
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ
 21
PCI: Setting latency timer of device 0000:00:09.0 to 64
forcedeth: using HIGHDMA
eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0

Both systems are running the exact same kernel and modules (rsynced to both from a 3rd machine) and neither system is passing any options to the forcedeth module.

We have been unable to get the S2912 system to crash with Manfred's latest patch on it and we've been hammering on it pretty hard.  Where as we can still fairly easily push the S2927 system over.
Comment 53 Joshua Hoblitt 2007-10-19 16:40:58 UTC
Created attachment 13214 [details]
Manfred's patch with trivial changes need to apply to netdev-2.6
Comment 54 Joshua Hoblitt 2007-10-24 21:03:31 UTC
I see this patch has been committed to netdev... does that mean there is no interested in resolving these issues on the s2927 boards?
Comment 55 Manfred Spraul 2007-10-29 12:15:31 UTC
Yes, we will try to fix the s2927 boards, too.
My patch was applied immediately, because it fixes an obvious bug.
Do the S2912 boards use MSI interrupt or does that board use MSI-X?
Comment 56 Joshua Hoblitt 2007-10-29 17:29:30 UTC
It appears to be MSI for both.  I'm attaching the /proc/interupts from both systems.
Comment 57 Joshua Hoblitt 2007-10-29 17:30:03 UTC
Created attachment 13328 [details]
s2912 /proc/interupts
Comment 58 Joshua Hoblitt 2007-10-29 17:30:30 UTC
Created attachment 13329 [details]
s2927 /proc/interupts
Comment 59 Joshua Hoblitt 2007-11-06 13:35:46 UTC
Can I provide any other information?
Comment 60 Joshua Hoblitt 2007-12-04 18:58:29 UTC
Just an FYI, we can no longer help debug this issue on the Tyan 2927s.  We've replaced all 17 of our S2927s with S2912s.
Comment 61 Anton Kuzmin 2007-12-16 04:57:23 UTC
I am also experiencing this problem (even with the latest 2.6.24-rc5).
Under the heavy load of azureus bittorrent downloading/uploading, I notice a lot of lines like this in dmesg:
   eth1: too many iterations (6) in nv_nic_irq.

I don't know about 2.6.24-rc5 yet, but on 2.6.22 after a while my network connection stopped and even reloading of forcedeth module did not help.

What should I do to eliminate this behaviour and help with fixing the bug?
Comment 62 Joshua Hoblitt 2008-01-30 17:34:13 UTC
I'm seeing the exactly same thing as Anton Kuzmin on the S2912s even under 2.6.24.  The heavier the load the faster the network dies (without an oops) and the system needs to be rebooted to recover the interface.  Loadning/unloading the forcedeth modules can cause the system to hang.
Comment 63 Andreas Girlich 2008-02-04 08:40:48 UTC
Yes, this bug is really annoying. We're running S2927 boards and having exactly the same problems. Is there any official statement for a work-around or bug fix being developed on?
How can we provide useful information?
Comment 64 Francois Cartegnie 2008-02-07 07:18:01 UTC
I have similar problem on a MCP61 MSI board.
I don't use the nic because of these reset nic/crash problems, but even without traffic and just the module loaded, the card makes the system hang on...shutdown.
That's related to a skb_free.
I'll try to take a screenshot.
Comment 65 Francois Cartegnie 2008-02-07 07:21:51 UTC
Please also read link in my comments on:
http://bugzilla.kernel.org/show_bug.cgi?id=8700
Comment 66 Ha Quoc Viet 2008-03-11 10:08:47 UTC
I have posted a bug regarding this same issue, before seeing this thread :(

I have a tyan board with two dual core opterons 285 and nivida chipset.
I am using both gigabit adaptors (forcedeth), one on the local lan (gigabit),
the other on adsl.
the second "dumps tx registers" every so often, on
2.6.22
2.6.23
2.6.24
2.6.24.3
while the other (with the highest nfs bandwidth) stays put
Comment 67 Joshua Hoblitt 2008-03-19 15:03:28 UTC
As noted above, the only solution seems to be to use another NIC.  We've switched to ordering systems with a different southbridge/integrated NIC as well.  I guess the netdev folks are just as baffled by this as we are as I haven't heard anything since Oct.

Has anyone tested with the latest netdev sources?  There appears to be a patch in there related to reset problems.
Comment 68 Joshua Hoblitt 2008-08-18 16:57:14 UTC
The problems MCP55 continue until at least 2.6.24.2 with our inhouse testing.  Does anyone know of a known working version of forcedeth.c?  It looks like we may be forced to try 2.6.26 in production as this issue is so difficult to trip with artificial testing.
Comment 69 Ayaz Abdulla 2008-08-18 17:21:48 UTC
Hi Joshua,

I can help debug this issue. Lets try to get a baseline configuration working and then we can start debugging with more features. Can you try using the latest forcedeth driver from Jeff Garzik's netdev branch with msi turned off and napi turned off?
Comment 70 Joshua Hoblitt 2008-08-18 18:25:37 UTC
Ayaz.  Sure, I just cloned the tree and made a build.  Do you think that these issues are already resolved?
Comment 71 Ayaz Abdulla 2008-08-19 15:18:58 UTC
There was a MSI fix recently. As for NAPI, I believe there is a race condition for turning on/off interrupts between the 3 threads: nv_nic_irq_optimized, nv_do_nic_poll and nv_do_napi_poll. I am in the midst of fixing that race.
Comment 72 Joshua Hoblitt 2008-08-19 19:36:15 UTC
I couldn't get the rtc working with the netdev git tree I pulled yesterday... I just repulled the tree and will try again.  We need a working rtc so we can test in our production environment.
Comment 73 Joshua Hoblitt 2008-09-22 13:02:08 UTC
Hi Ayaz,

I've got a build of 2.6.27-rc5 from netdev-2.6 running on the systems that seems to have eliminated a large majority of the problems we were attributing to forcedeth.c.  Unfortunately this required fighting a number of other issues (see bug #11388 & bug #11543).  There are still other problems going on that may still be rooted in forcedeth.c  Currently, are all of the forcedeth.c patches from netdev merged into linus/2.6.27-rc7?  I'd like to switch over to the tip tree for debugging purposes but haven't been able to get a happy merge of the netdev & tip tree with git.
Comment 74 Joshua Hoblitt 2008-09-25 13:34:58 UTC
Created attachment 18048 [details]
2.6.27-rc5 (netdev-2.6) .config
Comment 75 Joshua Hoblitt 2008-09-25 13:39:29 UTC
Hi Ayaz,

I think that 2.6.27-rc5 has more or less ended our forcedeth.c NIC deaths and system crashes.  NAPI is off and we are loading the driver with these options:

options forcedeth msi=0 msix=0

The dmesg is overflowing with this message:

1110687.465324] eth0: too many iterations (6) in nv_nic_irq.
[1110709.082624] eth0: too many iterations (6) in nv_nic_irq.
[1110752.755563] eth0: too many iterations (6) in nv_nic_irq.
[1110754.200284] eth0: too many iterations (6) in nv_nic_irq.

As we haven't upped the number of allowable packets as in the past this has caused the system crashes to get even worse.  Where do you suggest we go from here?
Comment 76 Ayaz Abdulla 2008-09-26 15:43:09 UTC
There is a patch submitted by Joe Korty regarding bumping up the max_interrupt_work, so it would be worthwhile to try it out on top of the current working changes you have.

Here is the patch:
From: Joe Korty <joe.korty@ccur.com>

This eliminates the following often-generated warning from my 64 bit
Opteron SMP test stand:

	eth0: too many iterations (6) in nv_nic_irq

According to the web, the problem is that the forcedeth driver has a
too-low value for max_interrupt_work.  Grepping the kernel I see that
forcedeth has the second lowest value of all ethernet drivers (ie, 6). 
Most are in the 20-40 range.  So this patch increases this a bit, from 6
to 15 (at 15 forcedeth becomes the driver with third-lowest
max_interrupt_work value).

My test stand, which used to print out the above warnings repetitively
whenever it was under heavy net load, no longer does so.

Signed-off-by: Joe Korty <joe.korty@ccur.com>
Cc: Ayaz Abdulla <aabdulla@nvidia.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/net/forcedeth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN drivers/net/forcedeth.c~forcdeth-increase-max_interrupt_work drivers/net/forcedeth.c
--- a/drivers/net/forcedeth.c~forcdeth-increase-max_interrupt_work
+++ a/drivers/net/forcedeth.c
@@ -818,7 +818,7 @@ struct fe_priv {
  * Maximum number of loops until we assume that a bit in the irq mask
  * is stuck. Overridable with module param.
  */
-static int max_interrupt_work = 5;
+static int max_interrupt_work = 15;
 
 /*
  * Optimization can be either throuput mode or cpu mode
_
Comment 77 Joshua Hoblitt 2008-09-30 14:33:33 UTC
At least for us, max_interupt_work = 15 isn't enough.  Even with a modest load the dmesg is filling up with:

eth0: too many iterations (16) in nv_nic_irq.

That's not to say that this patch isn't the right thing to do.  The driver options we are using are:

options forcedeth msi=0 msix=0 max_interrupt_work=15
Comment 78 Joshua Hoblitt 2008-10-01 14:53:56 UTC
I see the max_interrupt_work patch hasn't been commited to netdev-2.6.  I'd suggest holding off on it for a little bit.  I'm going to give max_interrupt_work=30 a try in production.
Comment 79 Joshua Hoblitt 2008-10-07 15:27:13 UTC
It looks like max_interrupt_work=30 almost completely (but not completely) eliminated the :

[244205.344674] eth0: too many iterations (31) in nv_nic_irq.

entries in the dmesg.  My recommendation is that this patch be revised and committed. We will probably experiment with slightly higher values over the next few weeks.
Comment 80 Joshua Hoblitt 2008-10-13 18:21:49 UTC
Can someone take a look at bug #11543 and determine if it is related to forcedeth.c?