Subject : 2.6.28-rc8-git3 forcedeth WARNING (kills the interface) Submitter : Brad Campbell <brad@wasp.net.au> Date : 2008-07-03 10:07 References : http://marc.info/?l=linux-netdev&m=121508714430752&w=4 This entry is being used for tracking a regression from 2.6.25. Please don't close it until the problem is fixed in the mainline.
On Sunday, 6 of July 2008, Brad Campbell wrote: > Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.25. Please verify if it still should be listed. > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11039 > > Subject : 2.6.28-rc8-git3 forcedeth WARNING (kills the > interface) > > Submitter : Brad Campbell <brad@wasp.net.au> > > Date : 2008-07-03 10:07 (4 days old) > > References : http://marc.info/?l=linux-netdev&m=121508714430752&w=4 > > While it is certainly a problem I can't verify it as a regression. When I got > the machine I ran it > with 2.6.25 but found SATA errors were locking the box. > > The SATA issue is resolved with 2.6.26-rc and I'm not terribly keen to risk > my data to go back and > check unless someone absolutely needs me to. > > It does appear to be quite a problem though. > > brad@srv:~$ dmesg | head -n5 > [ 0.000000] Linux version 2.6.26-rc8-git4 (brad@srv) (gcc version 4.1.2 > 20061115 (prerelease) > (Debian 4.1.1-21)) #5 SMP Fri Jul 4 23:08:38 GST 2008 > [ 0.000000] Command line: root=/dev/md1 ro > [ 0.000000] BIOS-provided physical RAM map: > [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009d400 (usable) > [ 0.000000] BIOS-e820: 000000000009d400 - 00000000000a0000 (reserved) > > brad@srv:~$ dmesg | grep 'eth1: tx_timeout' | wc -l > 27 > > brad@srv:~$ uptime > 17:40:25 up 1 day, 1:15, 5 users, load average: 0.73, 0.61, 0.49
Created attachment 16755 [details] dmesg of complete boot and failure with acpi, apic and msi disabled
On Sunday, 13 of July 2008, Brad Campbell wrote: > Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.25. Please verify if it still should be listed. > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11039 > > Subject : 2.6.28-rc8-git3 forcedeth WARNING (kills the > interface) > > Submitter : Brad Campbell <brad@wasp.net.au> > > Date : 2008-07-03 10:07 (11 days old) > > References : http://marc.info/?l=linux-netdev&m=121508714430752&w=4 > > I tested a 2.6.26-rc kernel with forcedeth.c from 2.6.25 and it failed in the > same way, so I don't > think its a regression as such. In any case, with the -rc9 it still fails > regularly.
I've just returned from a week away, and as there was nobody home the traffic on eth0 was almost none while eth1 remained quite busy. Over this period the interface did not lock up once. On further testing it appears eth1 only locks up in the event of heavier usage of eth0. eth0 is a GBit connection to my home network while eth1 is connected to my ADSL router directly. eth1 is rate limited to 256kbit out and 760kbit in so it's nowhere near fully loaded, but it does process a lot of packets per second and the traffic stream is pretty much constant at the limits. eth0 utilisation varies from 0 to about 20MB/s as it is the main file serving link for the network.
Hi Brad! I just stumbled upon this bugreport. I'm guessing this is fixed in the meantime? Or do you still suffer these lockups / changed hardware? Also there was a commit in 2009 to fix the tx-ring setup in case of tx_timeout which could maybe have fixed the reported warning... potentially... commit 8f955d7f042e4ac44891a400d5000928f8db9f58 Author: Ayaz Abdulla <aabdulla@nvidia.com> Date: Sat Apr 25 09:17:56 2009 +0000 forcedeth: tx timeout fix This patch fixes the tx_timeout() to properly handle the clean up of the tx ring. It also sets the tx put pointer back to the correct position to be in sync with HW. Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> (this would fit the reported stacktrace): [ 2936.865224] Call Trace: [ 2936.865224] <IRQ> [<ffffffff802338e4>] warn_on_slowpath+0x64/0xa0 [ 2936.865224] [<ffffffffa0046198>] :forcedeth:reg_delay+0x58/0xb0 [ 2936.865224] [<ffffffffa0046fb6>] :forcedeth:nv_drain_tx+0xb6/0x1a0 [ 2936.865224] [<ffffffffa004504a>] :forcedeth:setup_hw_rings+0x2a/0x100 [ 2936.865224] [<ffffffffa00492e7>] :forcedeth:nv_tx_timeout+0x287/0x2c0 [ 2936.865224] [<ffffffff80463ce5>] dev_watchdog+0xf5/0x110 [ 2936.865224] [<ffffffff8023d1b2>] run_timer_softirq+0x192/0x200 [ 2936.865224] [<ffffffff802391a9>] __do_softirq+0x69/0xe0 [ 2936.865224] [<ffffffff8020c61c>] call_softirq+0x1c/0x30 [ 2936.865224] [<ffffffff8020eb85>] do_softirq+0x35/0x70 [ 2936.865224] [<ffffffff80239137>] irq_exit+0x87/0x90 [ 2936.865224] [<ffffffff8021b5fc>] smp_apic_timer_interrupt+0x7c/0xc0 [ 2936.865224] [<ffffffff8020c0c6>] apic_timer_interrupt+0x66/0x70 [ 2936.865224] <EOI> [ 2936.865224] ---[ end trace 6e6bcab61ac567c9 ]---
Please reopen if this still happens.