Bug 9047
Summary: | (net forcedeth) MCP55 oops under heavy load | ||
---|---|---|---|
Product: | Drivers | Reporter: | Joshua Hoblitt (j_kernel) |
Component: | Network | Assignee: | Ayaz Abdulla (aabdulla) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | a.girlich, j_kernel, kernel, manfred, price, selecter, viet, yyyeer.bo |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.26 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
.config
/proc/iomem /proc/ioports lspci -vvv /proc/meminfo /proc/cpuinfo forcedeth.c Add timer lock New patch for locking lspci -v -xxx debug timer callbacks and interrupts dmesg output from debugging patch test patch - is skb really NULL? forcedeth.o bugfix: pci_enable_msi changes pci_dev->irq Manfred's patch with trivial changes need to apply to netdev-2.6 s2912 /proc/interupts s2927 /proc/interupts 2.6.27-rc5 (netdev-2.6) .config |
Description
Joshua Hoblitt
2007-09-20 21:22:16 UTC
Created attachment 12891 [details]
.config
Created attachment 12892 [details]
/proc/iomem
Created attachment 12893 [details]
/proc/ioports
Created attachment 12894 [details]
lspci -vvv
Created attachment 12895 [details]
/proc/meminfo
Created attachment 12896 [details]
/proc/cpuinfo
1) Can you try with msi=0 msix=0 module parameters when you load the driver? 2) Can you try with max_interrupt_work module parameter set to a really high value? ie. 500 I've tested both #1 and #2 per your request. The executive summary is that #1, #2, and #1 & #2 together all prevent the oops. My naive impression is that #2 is the better fix as it leaves MSI enabled (I assume that this is superior to INTx emulation) while preventing the oops and since these are 4-way SMP systems the extra latency in the driver shouldn't be a killer. Is there a means of highwater marking the number of events that are queued up? Why does the event queue filling up cause an oops? # modprobe forcedeth msi=0 msix=0 ACPI: PCI interrupt for device 0000:00:09.0 disabled ACPI: PCI interrupt for device 0000:00:08.0 disabled forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:09.0 to 64 forcedeth: using HIGHDMA eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0 eth0: no IPv6 routers present eth0: no IPv6 routers present running with heavy nfs load for ~30mins dmesg | grep "eth0: too many iterations (6) in nv_nic_irq." | wc -l 700 # modprobe forcedeth msi=0 msix=0 max_interrupt_work=500 ACPI: PCI interrupt for device 0000:00:09.0 disabled ACPI: PCI interrupt for device 0000:00:08.0 disabled forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:09.0 to 64 forcedeth: using HIGHDMA eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0 eth0: no IPv6 routers present eth0: no IPv6 routers present running with heavy nfs load for ~30mins no new dmesg entries # modprobe forcedeth max_interrupt_work=500 ACPI: PCI interrupt for device 0000:00:09.0 disabled ACPI: PCI interrupt for device 0000:00:08.0 disabled forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:09.0 to 64 forcedeth: using HIGHDMA eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0 eth0: no IPv6 routers present The "ACPI: PCI interrupt for device 0000:00:09.0 disabled" lines made me suspect that the msi disabled flags were sticking around in the hardware so I set "options forcedeth max_interrupt_work=500" and rebooted to the system to get the hardware back to it's default state. The boot dmesg ommited the "disabled" lines. forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 22 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0 ACPI: PCI Interrupt Link [LMAD] enabled at IRQ 21 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:09.0 to 64 forcedeth: using HIGHDMA eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0 This is configuration seems to be stable under heavy NFS load with no unusual dmesg entries. Note that all testing was with the following sysctl explicitly set: net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.core.rmem_max = 16777216 net.core.rmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 net.core.netdev_max_backlog = 2500 Can you verify in original setup whether MSI or MSIX was setup? You can do "cat /proc/interrupts" to verify this. It will say MSI or MSIX or regular interrupts. I have a hunch on why the network dies without the oops. I still need to investigate the "oops". (this is a copy of an email I sent directly to Ayaz this morning while bugzilla was down) The kernel.org bugzilla went down as I was responding. Here is the /proc/interrupts (I thought I'd sent that in with the original bug report but I guess not). # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 1265 37207 65587 72857808 IO-APIC-edge timer 1: 0 0 0 8 IO-APIC-edge i8042 4: 1474 1304 3994 171125 IO-APIC-edge serial 6: 0 0 0 2 IO-APIC-edge floppy 8: 0 0 0 1 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 14: 0 0 0 59 IO-APIC-edge ide0 15: 0 0 0 0 IO-APIC-edge libata 19: 182100 373567 72387 3712628 IO-APIC-fasteoi arcmsr 20: 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1 21: 0 0 0 0 IO-APIC-fasteoi sata_nv 22: 0 0 0 0 IO-APIC-fasteoi sata_nv 23: 0 0 0 0 IO-APIC-fasteoi sata_nv, ohci_hcd:usb2 1276: 33167 62789742 665173 235921228 PCI-MSI-edge eth0 NMI: 0 0 0 0 LOC: 72961951 72961997 72961953 72961895 ERR: 0 If the int distribution looks strange it's because I started up the irqbalance daemon at the end of business on Friday thinking that the system was stable. The system is however not stable. As of this morning the kernel was responding to icmp (echo requests) but other then that you can not open sockets to/from the system. The serial console is still responding but the system is very unhappy because of hung NFS mounts. I guess the upside is that system ran for decent period with a heavy network load before failing: eth0 Link encap:Ethernet HWaddr 00:E0:81:76:5F:52 inet addr:128.171.227.106 Bcast:128.171.227.255 Mask:255.255.255.0 inet6 addr: fe80::2e0:81ff:fe76:5f52/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:199898797 errors:434 dropped:0 overruns:0 frame:434 TX packets:160862503 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:87001054754 (82970.6 Mb) TX bytes:338307355734 (322635.0 Mb) Interrupt:22 There was nothing of interest in the dmesg except nfs timeouts. rmmod forcedeth; modprobe forcedeth brought the network right back up. -J Can you send me your forcedeth.c file so that I can create a patch for you to try? Created attachment 12942 [details]
forcedeth.c
Created attachment 12949 [details]
Add timer lock
Here is a potential fix for the Oops. Can you give it a try?
I patched the same 2.6.23-rc7-git1 kernel with the patch, rebuilt, removed all options passed to the forcedeth driver and rebooted for good measure to make sure the hardware was in a clean state. It ran for ~10mins under heavy load before giving up with another oops. Unfortunately it looks like the serial console went dead before we got the entire backtrace which is pretty consistent with the old oops behavior. Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b PGD 11f01d067 PUD 122138067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: i2c_core autofs4 smsc37b787_wdt rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror Pid: 0, comm: swapper Not tainted 2.6.23-rc7-git1 #2 RIP: 0010:[<ffffffff880a047d>] [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b RSP: 0018:ffff810224667e60 EFLAGS: 00010046 RAX: 0000000014000000 RBX: 0000000000000000 RCX: 00000000d8000000 RDX: ffff810096e78f00 RSI: 00000000b4f64010 RDI: ffff810224076070 RBP: 0000000034020042 R08: 0000000000000000 R09: ffff810096e78f00 R10: 0000000000000001 R11: 0000000000000002 R12: ffff8101226ca740 R13: 0000000000000042 R14: ffff8101226ca000 R15: 0000000000000001 FS: 00002b9bd0b85380(0000) GS:ffff810123f01500(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000000000007c CR3: 000000011f036000 CR4: 00000000000006e0 Also worth noting, we've replicated this oops on a completely different system with the same Tyan S2927 motherboard so I think we've pretty much ruled hardware problems (other than inherent design flaw) out here. Per request, I changed the modules revision number to prove that the patched version is being loaded. forcedeth.c: Reverse Engineered nForce ethernet driver. Version 3.1415. We got the full backtrace on panic this time thou. Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b PGD 21f1a2067 PUD 21f1b9067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: i2c_core autofs4 smsc37b787_wdt rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror Pid: 7813, comm: nfsd Not tainted 2.6.23-rc7-git1 #2 RIP: 0010:[<ffffffff880a047d>] [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b RSP: 0018:ffff810224667eb8 EFLAGS: 00010246 RAX: 0000000014000000 RBX: 0000000000000000 RCX: 00000000d8000000 RDX: ffffffff8027df65 RSI: 0000000173737810 RDI: ffff810224700070 RBP: 00000000340205ea R08: ffff81022047dd28 R09: ffffffff802c7580 R10: ffff810121b471a0 R11: 00000000000011b1 R12: ffff8101224c7740 R13: 00000000000005ea R14: ffff8101224c7000 R15: 0000000000000001 FS: 00002b97dd5a36d0(0000) GS:ffff810123f01500(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000000007c CR3: 000000021f7e7000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process nfsd (pid: 7813, threadinfo ffff81022047c000, task ffff81022097f740) Stack: 00000040224c7740 ffff8101224c7740 0000000000000010 ffffc20002760000 ffff8101224c7000 0000000000000000 0000000000002000 ffffffff880a08a2 ffff810122056740 0000000000000000 0000000000000000 00000000000004fc Call Trace: <IRQ> [<ffffffff880a08a2>] :forcedeth:nv_nic_irq_optimized+0x89/0x22c [<ffffffff80259828>] handle_IRQ_event+0x25/0x53 [<ffffffff8025ae2c>] handle_edge_irq+0xe4/0x127 [<ffffffff8020e12c>] do_IRQ+0xf1/0x161 [<ffffffff802c7580>] ext3_file_write+0x0/0x94 [<ffffffff8020c101>] ret_from_intr+0x0/0xa <EOI> [<ffffffff802c7580>] ext3_file_write+0x0/0x94 [<ffffffff802834db>] do_sync_readv_writev+0xb7/0x107 [<ffffffff80294959>] iput+0x42/0x7b [<ffffffff80247a06>] autoremove_wake_function+0x0/0x2e [<ffffffff8053ff5f>] cache_check+0x64/0x433 [<ffffffff803210f1>] exp_find_key+0x96/0xa9 [<ffffffff8022c71f>] enqueue_task+0x3c/0x4f [<ffffffff80283379>] rw_copy_check_uvector+0x6c/0xdc [<ffffffff80283b2d>] do_readv_writev+0xb2/0x18b [<ffffffff8031c105>] nfsd_acceptable+0x0/0xca [<ffffffff8031cfd5>] nfsd_permission+0x23/0xf1 [<ffffffff8031d60a>] nfsd_vfs_write+0xe9/0x2dd [<ffffffff80282170>] __dentry_open+0x101/0x1aa [<ffffffff8031dc9a>] nfsd_open+0x13c/0x170 [<ffffffff8031ded8>] nfsd_write+0xc5/0xe2 [<ffffffff803253df>] nfsd3_proc_write+0xfb/0x11a [<ffffffff80319a46>] nfsd_dispatch+0xde/0x1e5 [<ffffffff80539752>] svc_process+0x3df/0x6ed [<ffffffff8054a89b>] __down_read+0x12/0x9a [<ffffffff8031a060>] nfsd+0x1a1/0x2b9 [<ffffffff8020ca08>] child_rip+0xa/0x12 [<ffffffff80319ebf>] nfsd+0x0/0x2b9 [<ffffffff8020c9fe>] child_rip+0x0/0x12 Code: 8a 43 7c 83 e0 f3 83 c8 04 88 43 7c 83 7b 6c 00 8b 83 a8 00 RIP [<ffffffff880a047d>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b RSP <ffff810224667eb8> CR2: 000000000000007c Kernel panic - not syncing: Aiee, killing interrupt handler! Created attachment 12961 [details]
New patch for locking
Can you try out this new patch? It protects all access to rx_process and alloc_rx for non-NAPI.
This patch appears to be a step in the right direction. I applied the patch to clean -rc7-git1 sources, installed the modules and rebooted. I again changed the version number in the module to prove that the correct driver is being loaded on boot: forcedeth.c: Reverse Engineered nForce ethernet driver. Version 6.283. ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 22 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0 ACPI: PCI Interrupt Link [LMAD] enabled at IRQ 21 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:09.0 to 64 forcedeth: using HIGHDMA eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0 The system seemed to fair pretty well this time with the familiar messages showing up in the dmesg under heavy load. dmesg | grep "eth0: too many iterations (6) in nv_nic_irq." | wc -l 1280 The "standard" testing load we were using before wasn't enough to push the machine over. We had to up the NFS client load from ~900mbit/s to ~1gbit/s before the network interface died. It appears that the oops is gone thou. We are going to run this test a few more times. When the network interface dies, can you see if the interrupt count is incrementing from 'cat /proc/interrupts'. Can you also verify if either the tx or rx counters are incrementing from 'ifconfig'. Also, please send output of "ethtool -d ethX" Thanks. Will do. We seem to be having trouble getting it to fall over again - this is be far the longest it's ever made. dmesg | grep "eth0: too many iterations (6) in nv_nic_irq." | wc -l 4482 Of course, within seconds of my last posting that the system hard locked. We didn't even get the oops on the serial console. It just completely froze. Just out of curiosity, is it worth trying CONFIG_FORCEDETH_NAPI since in theory this will reduce the number of hardware interupts? You could give it a try as a data point, but only difference is that rx processing is done through a napi handler instead within the IRQ handler. Also, could you send me the verbose output of all pci config data at any time the system has the driver loaded? i.e. lspci -v -xxx Created attachment 12984 [details]
lspci -v -xxx
Welp, It crashes with NAPI enabled as well. The last few crashes w/o NAPI have been deadlocks without even an oops. How important is it to look at the system while it's in the dead NIC but the system is still failure mode? It seems to be a random scatter of failure modes between; Oops, partial oops, deadlock, and dead NIC. In an attempt to get these systems into a usable state we tried setting max_interrupt_work to an insanely high value. The issue still persists at 5k and it looks like 50k is high enough to cause problems of it own (but we are testing this setting for stability right now)... options forcedeth msi=0 msix=0 max_interrupt_work=50000 WARNING: at kernel/softirq.c:139 local_bh_enable() Call Trace: <IRQ> [<ffffffff8023abd0>] local_bh_enable+0x3a/0x91 [<ffffffff804b99a0>] sock_queue_rcv_skb+0xed/0x105 [<ffffffff804f868a>] udp_queue_rcv_skb+0x130/0x1f4 [<ffffffff804f8c2d>] __udp4_lib_rcv+0x4df/0x749 [<ffffffff804da87c>] ip_local_deliver+0xca/0x14c [<ffffffff804dadbd>] ip_rcv+0x4bf/0x502 [<ffffffff8022c381>] __wake_up_common+0x3e/0x68 [<ffffffff880a1f84>] :forcedeth:nv_napi_poll+0x567/0x729 [<ffffffff804c23c0>] net_rx_action+0xa8/0x1ad [<ffffffff8809fe70>] :forcedeth:nv_nic_irq_optimized+0xa5/0x237 [<ffffffff8023ad09>] __do_softirq+0x55/0xc4 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28 [<ffffffff8020debd>] do_softirq+0x2c/0x7d [<ffffffff8020e179>] do_IRQ+0x13e/0x161 [<ffffffff8020ae61>] default_idle+0x0/0x3d [<ffffffff8020c101>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8020ae8a>] default_idle+0x29/0x3d [<ffffffff8020af5a>] cpu_idle+0xbc/0xdf Created attachment 13029 [details]
debug timer callbacks and interrupts
Hi,
I'm trying to understand why the spinlock fixed the problem. Therefore I've written a debug patch that searches for a race.
Could you apply the attached patch and run it, without any special options (i.e.: try to provoke a crash).
The patch is against the forcedeth.c file that you posted.
Thanks,
Manfred
I think you mean "why the spinlock didn't fix the problem"? I will try this patch and post the results. Jeff, Is there anything in git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git fe-lock that might help resolve this issue? It is definitely worth testing, yes. As the branch name implies, the locking is definitely cleaned up (though disable_irq stuff is still there). We'll give the fe-lock a try on Monday. As a side note, we've managed to recreate this issue on Tyan S2912 boards as well so it's not something specific to the MCP firmware that ships with S2927s. Hi Joshua, two questions: - do you use net-console? The local_irq handling with poll_controller in both the normal driver and fe-lock driver look suspicious. - Did you try my test patch? I'm just noisy if there is really a race between disable_irq() and the interrupt handler. If this is true, then the MSI interrupt handler is broken. thanks, Manfred Hi Manfred, To answer your questions: - We are not using net-console. The literal kernel cmdline for consoles is "console=tty0 console=ttyS0,115200n8" - I booted a machine with your patch last Friday but it hasn't fallen over yet. Hopefully we'll be able to make this happen today. Cheers Hi Manfred, We've managed to crash a machine twice running your patched up kernel and both times the system has just dead locked without giving an oops on the console. How important is it to get an oops with that patch? Cheers I pulled from the netdev-2.6 fe-lock branch, copied in my .config, and booted the system on "2.6.23-rc9-g1db49136-dirty". The system still crashed but it, non emphatically, seems like it took a bit longer than usual to go down. For the 3rd time in a row we didn't get the oops on the serial console (i'm guessing just bad luck) but I did see this message appear in the dmesg after we put a load on the system. WARNING: at net/core/dev.c:2154 net_rx_action() Call Trace: <IRQ> [<ffffffff804c25c9>] net_rx_action+0xc9/0x190 [<ffffffff8023ac6d>] __do_softirq+0x55/0xc4 [<ffffffff8020ccfc>] call_softirq+0x1c/0x28 [<ffffffff8020de3d>] do_softirq+0x2c/0x7d [<ffffffff8020e0f9>] do_IRQ+0x13e/0x161 [<ffffffff8020ae11>] default_idle+0x0/0x3d [<ffffffff8020c081>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8020ae3a>] default_idle+0x29/0x3d [<ffffffff8020aed9>] cpu_idle+0x8b/0xae Hi Manfred, We're giving your debugging patch another try - we're trying to crash it now. Here is what it prints into the dmesg right after being loaded. test 1: ok test 1: ok done, test 2 start forcedeth: lock error. Previous owner ffffffff8809ff8f (0), now ffffffff8809ff9b (0) nv_open+0x5f/0x53a [forcedeth] nv_open+0x6b/0x53a [forcedeth] test 2: fail done, test 3 start forcedeth: lock not held ffffffff8809ffc1 (0) nv_open+0x6b/0x53a [forcedeth] test 3: fail done Manfred, Here is the oops message from a system while running your debugging patch. Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: [<ffffffff880a059e>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b eth0: too many iterations (6) in nv_nic_irq. PGD 2275e7067 PUD 223784067 PMD 0 Oops: 0000 [1] SMP CPU 2 Modules linked in: autofs4 rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror Pid: 0, comm: swapper Not tainted 2.6.23-rc7-git1 #3 RIP: 0010:[<ffffffff880a059e>] [<ffffffff880a059e>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b RSP: 0018:ffff810229bb3e60 EFLAGS: 00010246 RAX: 0000000014000000 RBX: 0000000000000000 RCX: 0000000004000000 RDX: ffff810227bc7a00 RSI: 000000021f75b810 RDI: ffff8102294f3870 RBP: 0000000034020202 R08: 0000000000000004 R09: ffff810227bc7a00 R10: 0000000000000001 R11: 0000000000000002 R12: ffff810223632740 R13: 0000000000000202 R14: ffff810223632000 R15: 0000000000000001 FS: 00002b13d1deaae0(0000) GS:ffff81022a501600(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000000000007c CR3: 0000000216d2c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff810229bac000, task ffff810229baa000) Stack: 0000004025224200 ffff810223632740 0000000000000003 ffffc20000c56000 ffff810223632000 0000000000000001 0000000000000000 ffffffff880a09c3 ffff810223632740 ffff810223632000 00000000000000ff ffffc20000c56000 Call Trace: <IRQ> [<ffffffff880a09c3>] :forcedeth:nv_nic_irq_optimized+0x89/0x22c [<ffffffff880a1534>] :forcedeth:nv_do_nic_poll+0x20f/0x2b1 [<ffffffff880a1325>] :forcedeth:nv_do_nic_poll+0x0/0x2b1 [<ffffffff8023ded0>] run_timer_softirq+0x16a/0x1df [<ffffffff8023ad09>] __do_softirq+0x55/0xc4 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28 [<ffffffff8020debd>] do_softirq+0x2c/0x7d [<ffffffff8021bbff>] smp_apic_timer_interrupt+0x49/0x5e [<ffffffff8020ae61>] default_idle+0x0/0x3d [<ffffffff8020c826>] apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff8020ae8a>] default_idle+0x29/0x3d [<ffffffff8020af5a>] cpu_idle+0xbc/0xdf Code: 8a 43 7c 83 e0 f3 83 c8 04 88 43 7c 83 7b 6c 00 8b 83 a4 00 RIP [<ffffffff880a059e>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b RSP <ffff810229bb3e60> CR2: 000000000000007c Kernel panic - not syncing: Aiee, killing interrupt handler! I don't know if this is related to the debugging patch or not but on Tyan S2912 boards occasionally the nic will fail to initialize. From the dmeseg: ADDRCONF(NETDEV_UP): eth0: link is not ready ethtool eth0 Settings for eth0: Supported ports: [ MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! (65535) Duplex: Unknown! (255) Port: MII PHYAD: 1 Transceiver: external Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: no Rebooting the systems brings the interfaces back Created attachment 13121 [details]
dmesg output from debugging patch
Hi Manfred, It looks like we did actually capture the dmesg into the logs before the system went down this time. Please let us know what the path forward is from here. This is starting to become an urgent issue. Also, I've discovered that it's much easier to trigger the oops by reducing the MTU. I image that this is just because the driver has to service more interupts. An MTU of 500 seems to be fairly effective. Could you send me forcedeth.o? I'll try to disassemble the code. All crashes are in the same line, always offset 0x007c. Perhaps it's a simple bug and not a race? Especially: >>> Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: [<ffffffff880a059e>] :forcedeth:nv_rx_process_optimized+0xe6/0x36b eth0: too many iterations (6) in nv_nic_irq. >>> First the oops, then the message about too many iterations. Very odd. Created attachment 13138 [details]
test patch - is skb really NULL?
I've compiled forcedeth.o on my x86-64 system:
It seems there is only one reference to 0x007c in nv_rx_process_optimized:
The skb->ip_summed bitfield. Thus skb seems to be NULL.
Thus np->get_rx_ctx->skb is NULL at the beginning of the packet rx loop.
Joshua, you wrote that you can reproduce the bug fairly good:
Could you apply the attached patch? I just want to confirm my suspicion.
Thanks
My suspicion was that nv_alloc_rx_optimized was called simultaneously by 2 different callers. That could cause the np->put_rx to be incremented twice and therefore cause an empty rx skb slot. Thats why I created the patch "New patch for locking" with spin_locks around the allocation routine. But Joshua stated that with that patch the system hit a hard lock... Created attachment 13139 [details]
forcedeth.o
Hi Manfred.
Josh appears to be out sick today, so I'm attaching forcedeth.o for you. Dunno if you still need it, but trying to help.
-rw-r--r-- 1 root root 58616 Oct 9 13:32 /usr/src/linux/drivers/net/forcedeth.o
Paul.
Hi Manfred, I'm actually in the office today. I applied the skb null test patch on top of your debbuging patch and it looks like we hit pay dirt! The dmesg is full of errors and every once in awhile the "skb = NULL" printf shows up. Paul tells me that ping is showing packet loss to this system with this patch installed so we are indeed hitting this case. Any idea as to why Ayaz's patch didn't fix this before? nv_do_nic_poll+0xb5/0x2b1 [forcedeth] nv_nic_irq_optimized_wrap+0x18/0x37 [forcedeth] rx_process_optimized with np->get_rx_ctx->skb = NULL np ffff810227854740 np->get_rx.ex ffff8102262c2730 np->put_rx.ex ffff8102262c2540 rx_processed_cnt 1 forcedeth: lock not held ffffffff880a0bd4 (3) nv_nic_irq_optimized_wrap+0x18/0x37 [forcedeth] eth0: too many iterations (6) in nv_nic_irq. forcedeth: lock error. Previous owner ffffffff880a1418 (3), now ffffffff880a0bbc (3) Hi Manfred, It looks like there is another NULL pointer dereference that appears once the first one is papered over. Unable to handle kernel NULL pointer dereference at 0000000000000210 RIP: [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248 PGD 21f906067 PUD 2272d0067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: autofs4 rtc_cmos rtc_core rtc_lib forcedeth aacraid 3w_9xxx 3w_xxxx atp870u aic7xxx arcmsr tg3 e1000 dm_mirror Pid: 0, comm: swapper Not tainted 2.6.23-rc7-git1 #3 RIP: 0010:[<ffffffff804bfa6b>] [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248 RSP: 0018:ffff810229b5be90 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff81008499c600 RCX: 0000000000000006 RDX: ffff81000103ea88 RSI: ffff81022c211a20 RDI: ffff81008499c600 RBP: ffff81000103eac0 R08: ffff81022a501342 R09: ffff8101795ffd38 R10: 000000000000fc00 R11: ffff8101795ffd38 R12: 0000000000000001 R13: ffff810229b5bf0c R14: ffff81000103ea80 R15: 000000010004cfb6 FS: 00002b7502c5eae0(0000) GS:ffff81022a501400(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000210 CR3: 0000000227312000 CR4: 00000000000006e0 DR0: 000000000 �` 0000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff810229b56000, task ffff810229b54000) Stack: 0000000000000000 ffff81000103eac0 0000000000000001 ffff810229b5bf0c ffff81000103ea80 ffffffff804c1d4d 0000004029605400 ffff81000103eac0 0000000000000000 ffff81000103eaa0 ffff81000103ea80 000000010004cfb6 Call Trace: <IRQ> [<ffffffff804c1d4d>] process_backlog+0x84/0x102 [<ffffffff804c1f57>] net_rx_action+0xa8/0x1ad [<ffffffff8023ad09>] __do_softirq+0x55/0xc4 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28 [<ffffffff8020debd>] do_softirq+0x2c/0x7d [<ffffffff8021bbff>] smp_apic_timer_interrupt+0x49/0x5e [<ffffffff8020ae61>] default_idle+0x0/0x3d [<ffffffff8020c826>] apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff8020ae8a>] default_idle+0x29/0x3d [<ffffffff8020af5a>] cpu_idle+0xbc/0xdf Code: 48 83 b8 10 02 00 00 00 74 49 48 8b a8 60 04 00 00 48 85 ed RIP [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248 RSP <ffff810229b5be90> CR2: 0000000000000210 Unable to handle kernel NULL pointer dereference<0>Kernel panic - not syncing: Aiee, killing interrupt handler! at 0000000000000210 RIP: [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248 PGD 223930067 PUD 2263be067 PMD 0 Oops: 0000 [2] SMP CPU 3 Modules link400 Process syslog-ng (pid: 7097, threadinfo ffff810223aac000, task ffff8102275c7740) Stack: 0000000000000000 ffff810001050ac0 0000000000000010 ffff810229be7efc ffff810001050a80 ffffffff804c1d4d 0000004000000287 ffff810001050ac0 0000000000000000 ffff810001050aa0 ffff810001050a80 000000010004cfb6 Call Trace: <IRQ> [<ffffffff804c1d4d>] process_backlog+0x84/0x102 [<ffffffff804c1f57>] net_rx_action+0xa8/0x1ad [<ffffffff8023ad09>] __do_softirq+0x55/0xc4 [<ffffffff8020cd7c>] call_softirq+0x1c/0x28 [<ffffffff8020debd>] do_softirq+0x2c/0x7d [<ffffffff8020e179>] do_IRQ+0x13e/0x161 [<ffffffff8020c101>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8028f0c5>] do_sys_poll+0x193/0x360 [<ffffffff8028f0c2>] do_sys_poll+0x190/0x360 [<ffffffff8028fbaf>] __pollwait+0x0/0xe1 [<fffffff1c2 [<ffffffff8028b601>] getname+0x14c/0x191 [<ffffffff80358881>] _atomic_dec_and_lock+0x39/0x58 [<ffffffff80286540>] sys_newstat+0x28/0x31 [<ffffffff8028f2c4>] sys_poll+0x32/0x3b [<ffffffff8020bbee>] system_call+0x7e/0x83 Code: 48 83 b8 10 02 00 00 00 74 49 48 8b a8 60 04 00 00 48 85 ed RIP [<ffffffff804bfa6b>] netif_receive_skb+0xf/0x248 RSP <ffff810229be7e80> CR2: 0000000000000210 WARNING: at drivers/char/vt.c:3566 do_unblank_screen() Call Trace: <IRQ> [<ffffffff803a7cb5>] do_unblank_screen+0x3e/0x120 [<ffffffff8035ea15>] bust_spinlocks+0x19/0x2c [<ffffffff80545b9c>] oops_end+0x15/0x57 [<ffffffff805475c5>] do_page_fault+0x6f0/0x7d1 [<ffffffff804e7c7e>] tcp_rcv_established+0x117/0x8cd [<ffffffff8054580d>] error_exit+0x0/0x84 [<ffffffff804bfa6b>] ne}%IE (In reply to comment #45) > Any idea as to why Ayaz's patch didn't fix this before? > > nv_do_nic_poll+0xb5/0x2b1 [forcedeth] > nv_nic_irq_optimized_wrap+0x18/0x37 [forcedeth] > rx_process_optimized with np->get_rx_ctx->skb = NULL > np ffff810227854740 np->get_rx.ex ffff8102262c2730 np->put_rx.ex > ffff8102262c2540 rx_processed_cnt 1 > forcedeth: lock not held ffffffff880a0bd4 (3) > nv_nic_irq_optimized_wrap+0x18/0x37 [forcedeth] > eth0: too many iterations (6) in nv_nic_irq. > forcedeth: lock error. Previous owner ffffffff880a1418 (3), now > ffffffff880a0bbc (3) > Could you send me a larger block from the log? 20 lines (preferably the first) skb=NULL message and 20 lines after the message. Thanks, Manfred Created attachment 13141 [details]
bugfix: pci_enable_msi changes pci_dev->irq
Hi Joshua,
I think I found the bug:
pci_enable_msi() changes pci_dev->irq.
Half of the driver used the new value, half used a stale copy in dev->irq.
Could you try the attached patch? It updates the copy in dev->irq [for users outside forcedeth.c] and replaces all accesses in forcedeth.c with direct acccesses of pci_dev->irq.
Hi Manfred, We tested your latest patch ontop of 2.6.23-rc7-git1 + your two other debugging patches and after an hour of testing it appears that the oops is fixed! We're about to try some more extensive testing with this patch against 2.6.23 (the patch applies cleanly) with CONFIG_FORCEDETH_NAPI set as this is how we'd like to run the driver in production. I'll report back the results one way or the other. Hi Manfred, Thus far it appears that this issue is resolved. I'm going to be rolling out this patch to to 7 more machines today. If all goes well I'll ready to stick a "Tested-by:" on this patch by the end of the today. I would also really like to see it submitted for 2.6.23.y. Do you feel that it is in good enough shape for the stable series or should I submit it to be part of my distribution (gentoo) kernel patch set for awhile? It looks like there may still be a problem. We got another crash but without an oops on the console. We are attempting to recreate the failure. We have two different models of motherboard with the MCP55. Tyan S2912s and S2927s. We started debugging this issue on a S2927 (most of the bug reports) but eventually switched to a S2912 with an IPMI card installed as the original system is not on remote managed power and power cycling it by hand was becoming tedious. It seems that the way IRQs are setup on these motherboards is slightly different. Here is the dmesg from the S2912 system: forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 18 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 18 (level, high) -> IR Q 18 PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 010f1:2912 bound to 0000:00:08.0 ACPI: PCI Interrupt Link [LMA2] enabled at IRQ 17 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMA2] -> GSI 17 (level, high) -> IR Q 17 PCI: Setting latency timer of device 0000:00:09.0 to 64 forcedeth: using HIGHDMA eth1: forcedeth.c: subsystem: 010f1:2912 bound to 0000:00:09.0 And here is the dmesg from the S2927 system: forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 22 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 22 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0 ACPI: PCI Interrupt Link [LMAD] enabled at IRQ 21 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 21 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:09.0 to 64 forcedeth: using HIGHDMA eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0 Both systems are running the exact same kernel and modules (rsynced to both from a 3rd machine) and neither system is passing any options to the forcedeth module. We have been unable to get the S2912 system to crash with Manfred's latest patch on it and we've been hammering on it pretty hard. Where as we can still fairly easily push the S2927 system over. Created attachment 13214 [details]
Manfred's patch with trivial changes need to apply to netdev-2.6
I see this patch has been committed to netdev... does that mean there is no interested in resolving these issues on the s2927 boards? Yes, we will try to fix the s2927 boards, too. My patch was applied immediately, because it fixes an obvious bug. Do the S2912 boards use MSI interrupt or does that board use MSI-X? It appears to be MSI for both. I'm attaching the /proc/interupts from both systems. Created attachment 13328 [details]
s2912 /proc/interupts
Created attachment 13329 [details]
s2927 /proc/interupts
Can I provide any other information? Just an FYI, we can no longer help debug this issue on the Tyan 2927s. We've replaced all 17 of our S2927s with S2912s. I am also experiencing this problem (even with the latest 2.6.24-rc5). Under the heavy load of azureus bittorrent downloading/uploading, I notice a lot of lines like this in dmesg: eth1: too many iterations (6) in nv_nic_irq. I don't know about 2.6.24-rc5 yet, but on 2.6.22 after a while my network connection stopped and even reloading of forcedeth module did not help. What should I do to eliminate this behaviour and help with fixing the bug? I'm seeing the exactly same thing as Anton Kuzmin on the S2912s even under 2.6.24. The heavier the load the faster the network dies (without an oops) and the system needs to be rebooted to recover the interface. Loadning/unloading the forcedeth modules can cause the system to hang. Yes, this bug is really annoying. We're running S2927 boards and having exactly the same problems. Is there any official statement for a work-around or bug fix being developed on? How can we provide useful information? I have similar problem on a MCP61 MSI board. I don't use the nic because of these reset nic/crash problems, but even without traffic and just the module loaded, the card makes the system hang on...shutdown. That's related to a skb_free. I'll try to take a screenshot. Please also read link in my comments on: http://bugzilla.kernel.org/show_bug.cgi?id=8700 I have posted a bug regarding this same issue, before seeing this thread :( I have a tyan board with two dual core opterons 285 and nivida chipset. I am using both gigabit adaptors (forcedeth), one on the local lan (gigabit), the other on adsl. the second "dumps tx registers" every so often, on 2.6.22 2.6.23 2.6.24 2.6.24.3 while the other (with the highest nfs bandwidth) stays put As noted above, the only solution seems to be to use another NIC. We've switched to ordering systems with a different southbridge/integrated NIC as well. I guess the netdev folks are just as baffled by this as we are as I haven't heard anything since Oct. Has anyone tested with the latest netdev sources? There appears to be a patch in there related to reset problems. The problems MCP55 continue until at least 2.6.24.2 with our inhouse testing. Does anyone know of a known working version of forcedeth.c? It looks like we may be forced to try 2.6.26 in production as this issue is so difficult to trip with artificial testing. Hi Joshua, I can help debug this issue. Lets try to get a baseline configuration working and then we can start debugging with more features. Can you try using the latest forcedeth driver from Jeff Garzik's netdev branch with msi turned off and napi turned off? Ayaz. Sure, I just cloned the tree and made a build. Do you think that these issues are already resolved? There was a MSI fix recently. As for NAPI, I believe there is a race condition for turning on/off interrupts between the 3 threads: nv_nic_irq_optimized, nv_do_nic_poll and nv_do_napi_poll. I am in the midst of fixing that race. I couldn't get the rtc working with the netdev git tree I pulled yesterday... I just repulled the tree and will try again. We need a working rtc so we can test in our production environment. Hi Ayaz, I've got a build of 2.6.27-rc5 from netdev-2.6 running on the systems that seems to have eliminated a large majority of the problems we were attributing to forcedeth.c. Unfortunately this required fighting a number of other issues (see bug #11388 & bug #11543). There are still other problems going on that may still be rooted in forcedeth.c Currently, are all of the forcedeth.c patches from netdev merged into linus/2.6.27-rc7? I'd like to switch over to the tip tree for debugging purposes but haven't been able to get a happy merge of the netdev & tip tree with git. Created attachment 18048 [details]
2.6.27-rc5 (netdev-2.6) .config
Hi Ayaz, I think that 2.6.27-rc5 has more or less ended our forcedeth.c NIC deaths and system crashes. NAPI is off and we are loading the driver with these options: options forcedeth msi=0 msix=0 The dmesg is overflowing with this message: 1110687.465324] eth0: too many iterations (6) in nv_nic_irq. [1110709.082624] eth0: too many iterations (6) in nv_nic_irq. [1110752.755563] eth0: too many iterations (6) in nv_nic_irq. [1110754.200284] eth0: too many iterations (6) in nv_nic_irq. As we haven't upped the number of allowable packets as in the past this has caused the system crashes to get even worse. Where do you suggest we go from here? There is a patch submitted by Joe Korty regarding bumping up the max_interrupt_work, so it would be worthwhile to try it out on top of the current working changes you have. Here is the patch: From: Joe Korty <joe.korty@ccur.com> This eliminates the following often-generated warning from my 64 bit Opteron SMP test stand: eth0: too many iterations (6) in nv_nic_irq According to the web, the problem is that the forcedeth driver has a too-low value for max_interrupt_work. Grepping the kernel I see that forcedeth has the second lowest value of all ethernet drivers (ie, 6). Most are in the 20-40 range. So this patch increases this a bit, from 6 to 15 (at 15 forcedeth becomes the driver with third-lowest max_interrupt_work value). My test stand, which used to print out the above warnings repetitively whenever it was under heavy net load, no longer does so. Signed-off-by: Joe Korty <joe.korty@ccur.com> Cc: Ayaz Abdulla <aabdulla@nvidia.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- drivers/net/forcedeth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN drivers/net/forcedeth.c~forcdeth-increase-max_interrupt_work drivers/net/forcedeth.c --- a/drivers/net/forcedeth.c~forcdeth-increase-max_interrupt_work +++ a/drivers/net/forcedeth.c @@ -818,7 +818,7 @@ struct fe_priv { * Maximum number of loops until we assume that a bit in the irq mask * is stuck. Overridable with module param. */ -static int max_interrupt_work = 5; +static int max_interrupt_work = 15; /* * Optimization can be either throuput mode or cpu mode _ At least for us, max_interupt_work = 15 isn't enough. Even with a modest load the dmesg is filling up with: eth0: too many iterations (16) in nv_nic_irq. That's not to say that this patch isn't the right thing to do. The driver options we are using are: options forcedeth msi=0 msix=0 max_interrupt_work=15 I see the max_interrupt_work patch hasn't been commited to netdev-2.6. I'd suggest holding off on it for a little bit. I'm going to give max_interrupt_work=30 a try in production. It looks like max_interrupt_work=30 almost completely (but not completely) eliminated the : [244205.344674] eth0: too many iterations (31) in nv_nic_irq. entries in the dmesg. My recommendation is that this patch be revised and committed. We will probably experiment with slightly higher values over the next few weeks. Can someone take a look at bug #11543 and determine if it is related to forcedeth.c? |