Bug 15398 - Networking hangs randomly.
Summary: Networking hangs randomly.
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-25 19:17 UTC by TAXI
Modified: 2012-06-27 13:43 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.33-rc8
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description TAXI 2010-02-25 19:17:30 UTC
The networking (don't know which part exactly, I think the sis190 driver) hangs randomly.
It was very hard to find out what's the problem couse most times there is no info in syslog.
But yesterday there was info:

[59668.053078] ------------[ cut here ]------------
[59668.053095] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x215/0x21e()
[59668.053098] Hardware name: P5SD1-FM2
[59668.053100] NETDEV WATCHDOG: eth0 (sis190): transmit queue 0 timed out
[59668.053102] Modules linked in: sco kqemu tuner_simple tuner_types
[59668.053111] Pid: 31159, comm: chrome Not tainted 2.6.33-rc8-T2 #5
[59668.053113] Call Trace:
[59668.053116]  <IRQ>  [<ffffffff81038b34>] ? warn_slowpath_common+0x65/0xc8
[59668.053128]  [<ffffffff810345fa>] ? try_preempt+0x18f/0x1c3
[59668.053132]  [<ffffffff81038beb>] ? warn_slowpath_fmt+0x40/0x45
[59668.053137]  [<ffffffff81234d42>] ? strlcpy+0x33/0x51
[59668.053140]  [<ffffffff815197ad>] ? dev_watchdog+0x215/0x21e
[59668.053146]  [<ffffffff81053ee1>] ? thread_group_cputimer+0x2b/0x141
[59668.053149]  [<ffffffff8105c7ea>] ? ktime_get+0x58/0xf4
[59668.053152]  [<ffffffff81519598>] ? dev_watchdog+0x0/0x21e
[59668.053157]  [<ffffffff81045b53>] ? run_timer_softirq+0x231/0x795
[59668.053160]  [<ffffffff8105c7ea>] ? ktime_get+0x58/0xf4
[59668.053163]  [<ffffffff8103ef8f>] ? __do_softirq+0x95/0x13e
[59668.053168]  [<ffffffff8101904c>] ? lapic_next_event+0x18/0x1f
[59668.053172]  [<ffffffff8100330c>] ? call_softirq+0x1c/0x30
[59668.053175]  [<ffffffff8100536a>] ? do_softirq+0x3a/0x68
[59668.053178]  [<ffffffff8103eef7>] ? irq_exit+0x8b/0x8e
[59668.053181]  [<ffffffff8101986a>] ? smp_apic_timer_interrupt+0x6b/0x98
[59668.053183]  [<ffffffff81002dd3>] ? apic_timer_interrupt+0x13/0x20
[59668.053185]  <EOI> 
[59668.053188] ---[ end trace c7e590599cdb97ab ]---
Comment 1 TAXI 2010-02-25 21:27:39 UTC
Postet wrong (Version:	2.5) - Will close this and open a new bug report.
Sorry for that.
Comment 2 TAXI 2010-02-25 21:29:57 UTC
Can't find a method to enter version by opening new bug.
Comment 3 Andrew Morton 2010-03-02 20:31:44 UTC
yup, sis190 is getting stuck.

Is this a regression?  Were any earlier kernel versions OK?  If so, which version(s)?
Comment 4 TAXI 2010-03-07 09:53:36 UTC
It's no regression.
It's very hard to say which kernel versions are OK and which not. This is couse of the nature of this bug. It occurs very rarely and if it does, it's hard to backtrace (remember: there is no info in syslog)
I think the bug occurs more often if you plug the Network-Cable out, wait a bit and plug it in again but I'm not sure about that.
Comment 5 nissarin 2010-03-20 15:18:14 UTC
I'm just wondering if it's the same thing I'm experiencing on my box. I've already searched the web, including bugzilla and I found quite a lot reports but this one looks most similar to mine. 
It's also similar with #15232 and #15139; speaking of which, in the last one there is some patch which add additional debugging information, is it possible to adapt this patch for r8169 (is s/e1000/r8169/ enough)?

Mar 15 19:54:45 radscorpion kernel: ------------[ cut here ]------------
Mar 15 19:54:45 radscorpion kernel: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x272/0x280()
Mar 15 19:54:45 radscorpion kernel: Hardware name: GA-MA78G-DS3H
Mar 15 19:54:45 radscorpion kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Mar 15 19:54:45 radscorpion kernel: Modules linked in: tun oss_usb oss_hdaudio osscore bridge stp llc pl2303 usbserial firewire_ohci usb_storage firewire_core i2c_piix4 psmouse
Mar 15 19:54:45 radscorpion kernel: Pid: 0, comm: swapper Not tainted 2.6.33-00159-gd424b92 #6
Mar 15 19:54:45 radscorpion kernel: Call Trace:
Mar 15 19:54:45 radscorpion kernel: <IRQ>  [<ffffffff81417d92>] ? dev_watchdog+0x272/0x280
Mar 15 19:54:45 radscorpion kernel: [<ffffffff81417d92>] ? dev_watchdog+0x272/0x280
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8105f084>] ? warn_slowpath_common+0x74/0xd0
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8105f141>] ? warn_slowpath_fmt+0x51/0x60
Mar 15 19:54:45 radscorpion kernel: [<ffffffff81052e20>] ? activate_task+0x40/0x70
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8105a887>] ? try_to_wake_up+0xb7/0x3c0
Mar 15 19:54:45 radscorpion kernel: [<ffffffff811f5041>] ? strlcpy+0x41/0x50
Mar 15 19:54:45 radscorpion kernel: [<ffffffff814047ab>] ? netdev_drivername+0x3b/0x40
Mar 15 19:54:45 radscorpion kernel: [<ffffffff81417d92>] ? dev_watchdog+0x272/0x280
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8105599c>] ? enqueue_task_fair+0x19c/0x1f0
Mar 15 19:54:45 radscorpion kernel: [<ffffffff81417b20>] ? dev_watchdog+0x0/0x280
Mar 15 19:54:45 radscorpion kernel: [<ffffffff81069dfc>] ? run_timer_softirq+0x13c/0x200
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8105a887>] ? try_to_wake_up+0xb7/0x3c0
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8107d6b1>] ? ktime_get+0x61/0xe0
Mar 15 19:54:45 radscorpion kernel: [<ffffffff81064c06>] ? __do_softirq+0xa6/0x130
Mar 15 19:54:45 radscorpion kernel: [<ffffffff810273cc>] ? call_softirq+0x1c/0x30
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8102919d>] ? do_softirq+0x4d/0x80
Mar 15 19:54:45 radscorpion kernel: [<ffffffff810648f5>] ? irq_exit+0x75/0x90
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8103e64c>] ? smp_apic_timer_interrupt+0x6c/0xa0
Mar 15 19:54:45 radscorpion kernel: [<ffffffff81026e93>] ? apic_timer_interrupt+0x13/0x20
Mar 15 19:54:45 radscorpion kernel: <EOI>  [<ffffffff8102e4d2>] ? default_idle+0x32/0x40
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8102e738>] ? c1e_idle+0xa8/0x100
Mar 15 19:54:45 radscorpion kernel: [<ffffffff8102579a>] ? cpu_idle+0xaa/0x100
Mar 15 19:54:45 radscorpion kernel: [<ffffffff816f2c55>] ? start_kernel+0x35c/0x41d
Mar 15 19:54:45 radscorpion kernel: [<ffffffff816f2373>] ? x86_64_start_kernel+0xe1/0xf2
Mar 15 19:54:45 radscorpion kernel: ---[ end trace 9b1f3a8aa1ae216e ]---
Mar 15 19:54:45 radscorpion kernel: r8169: eth0: link up

It's quite hard to trigger (rtorrent seams to make it easier to happen), I didn't notice it earlier but I started to check dmesg output quite often when I started playing with kms. But now when I think about it I had problems with network before, f.e. when playing multiplayer games I've experienced 4-5s lag spikes (I was able to see opponents movements but I was unable to move - no tx, rx worked). At that time I though it's network related problem but if that was caused by this bug, then the problem appeared much earlier (previously I used 2.6.32.7). Too bad that I don't remember when exactly problems started to show up and unfortunately I cleaned up logs recently.
Comment 6 IAmACarpetLicker 2010-04-01 23:04:47 UTC
Yeah, I'm getting this too.
Also, I have read several reports and we seem to all have one thing in common, and that's rtorrent.


It only seems to happen for me when I'm using it for other things. For e.g. it downloads using rtorrent, and usually this happens when it's both downloading and being used to play files back over the network. It's not the file services because I've tried NFS, AFP and SMB, all to the same effect.

Apr  1 23:37:14 download-server kernel: [  718.011280] ------------[ cut here ]------------
Apr  1 23:37:14 download-server kernel: [  718.011305] WARNING: at /build/buildd/linux-2.6.32/net/sched/sch_generic.c:261 dev_watchdog+0x262/0x270()
Apr  1 23:37:14 download-server kernel: [  718.011312] Hardware name: EasyNote_MX37-U-004
Apr  1 23:37:14 download-server kernel: [  718.011318] NETDEV WATCHDOG: eth0 (sis190): transmit queue 0 timed out
Apr  1 23:37:14 download-server kernel: [  718.011323] Modules linked in: vmnet parport_pc vsock vmci vmmon ppdev snd_hda_codec_realtek nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc arc4 snd_hda_intel snd_hda_codec snd_hwdep ath5k snd_pcm snd_timer fbcon tileblit font bitblit softcursor mac80211 ath snd lp video output soundcore snd_page_alloc vga16fb vgastate sis190 cfg80211 shpchp mii asus_laptop led_class sis_agp parport sata_sis
Apr  1 23:37:14 download-server kernel: [  718.011425] Pid: 0, comm: swapper Not tainted 2.6.32-17-server #26-Ubuntu
Apr  1 23:37:14 download-server kernel: [  718.011430] Call Trace:
Apr  1 23:37:14 download-server kernel: [  718.011435]  <IRQ>  [<ffffffff81066ceb>] warn_slowpath_common+0x7b/0xc0
Apr  1 23:37:14 download-server kernel: [  718.011456]  [<ffffffff81066d91>] warn_slowpath_fmt+0x41/0x50
Apr  1 23:37:14 download-server kernel: [  718.011465]  [<ffffffff81489842>] dev_watchdog+0x262/0x270
Apr  1 23:37:14 download-server kernel: [  718.011476]  [<ffffffff81080767>] ? insert_work+0x77/0xc0
Apr  1 23:37:14 download-server kernel: [  718.011488]  [<ffffffff810397a9>] ? default_spin_lock_flags+0x9/0x10
Apr  1 23:37:14 download-server kernel: [  718.011497]  [<ffffffff814895e0>] ? dev_watchdog+0x0/0x270
Apr  1 23:37:14 download-server kernel: [  718.011505]  [<ffffffff81077417>] run_timer_softirq+0x197/0x340
Apr  1 23:37:14 download-server kernel: [  718.011516]  [<ffffffff810943a0>] ? tick_sched_timer+0x0/0xc0
Apr  1 23:37:14 download-server kernel: [  718.011525]  [<ffffffff8108f113>] ? ktime_get+0x63/0xe0
Apr  1 23:37:14 download-server kernel: [  718.011534]  [<ffffffff8106e227>] __do_softirq+0xb7/0x1e0
Apr  1 23:37:14 download-server kernel: [  718.011542]  [<ffffffff81093f8a>] ? tick_program_event+0x2a/0x30
Apr  1 23:37:14 download-server kernel: [  718.011551]  [<ffffffff810142ec>] call_softirq+0x1c/0x30
Apr  1 23:37:14 download-server kernel: [  718.011559]  [<ffffffff81015cb5>] do_softirq+0x65/0xa0
Apr  1 23:37:14 download-server kernel: [  718.011566]  [<ffffffff8106e0c5>] irq_exit+0x85/0x90
Apr  1 23:37:14 download-server kernel: [  718.011576]  [<ffffffff8155c021>] smp_apic_timer_interrupt+0x71/0x9c
Apr  1 23:37:14 download-server kernel: [  718.011584]  [<ffffffff81013cb3>] apic_timer_interrupt+0x13/0x20
Apr  1 23:37:14 download-server kernel: [  718.011589]  <EOI>  [<ffffffff8130c7ce>] ? acpi_idle_enter_simple+0x117/0x14b
Apr  1 23:37:14 download-server kernel: [  718.011606]  [<ffffffff8130c7c7>] ? acpi_idle_enter_simple+0x110/0x14b
Apr  1 23:37:14 download-server kernel: [  718.011617]  [<ffffffff81448c77>] ? cpuidle_idle_call+0xa7/0x140
Apr  1 23:37:14 download-server kernel: [  718.011627]  [<ffffffff81011e63>] ? cpu_idle+0xb3/0x110
Apr  1 23:37:14 download-server kernel: [  718.011636]  [<ffffffff8154ee91>] ? start_secondary+0xa8/0xaa
Apr  1 23:37:14 download-server kernel: [  718.011643] ---[ end trace 62bcf8b592c12c43 ]---

They current kernel is 2.6.32-17-server build from Ubuntu 10.04 BETA.
Comment 7 Richard Wall 2010-04-26 09:55:07 UTC
I'm seeing a similar problem. We have a pair of very busy Squid servers (peak ~15000 TCP connections, ~100Mb/sec) using identical Asus server motherboards. The networking on one box intermittently fails - sometimes after a few hours sometimes after as much as 7 days.

Here's the relevant info and an extract from syslog / dmesg

# uname -r
2.6.31.12

# cat /sys/class/net/eth0/device/vendor 
0x14e4
# cat /sys/class/net/eth0/device/device 
0x1659

{{{
http://www.pcidatabase.com/vendor_details.php?id=767
0x1659	
Chip Number:	BCM5721
Chip Description:	NetXtreme Gigabit Ethernet PCI Express
}}}

# ethtool -i eth0
driver: tg3
version: 3.99
firmware-version: 5721-v3.65
bus-info: 0000:03:00.0

#cat /var/log/syslog
Apr 22 23:49:35  kernel: [623449.988504] ------------[ cut here ]------------
Apr 22 23:49:35  kernel: [623449.988511] WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0x1be/0x1d0()
Apr 22 23:49:35  kernel: [623449.988514] Hardware name: System Product Name
Apr 22 23:49:35  kernel: [623449.988516] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
Apr 22 23:49:35  kernel: [623449.988518] Modules linked in: ip_gre e1000 via_rhine tg3 libphy r8169 pcnet32 e100 8139too mii w83627ehf vt8231 via686a hwm
on_vid coretemp asus_atk0110 hwmon
Apr 22 23:49:35  kernel: [623449.988533] Pid: 0, comm: swapper Not tainted 2.6.31.12 #2
Apr 22 23:49:35  kernel: [623449.988535] Call Trace:
Apr 22 23:49:35  kernel: [623449.988540]  [<c012864e>] ? warn_slowpath_common+0x6e/0xb0
Apr 22 23:49:35  kernel: [623449.988543]  [<c034e4fe>] ? dev_watchdog+0x1be/0x1d0
Apr 22 23:49:35  kernel: [623449.988546]  [<c01286db>] ? warn_slowpath_fmt+0x2b/0x30
Apr 22 23:49:35  kernel: [623449.988549]  [<c034e4fe>] ? dev_watchdog+0x1be/0x1d0
Apr 22 23:49:35  kernel: [623449.988553]  [<c011f842>] ? __wake_up+0x42/0x60
Apr 22 23:49:35  kernel: [623449.988557]  [<c01376f2>] ? insert_work+0x42/0x50
Apr 22 23:49:35  kernel: [623449.988560]  [<c034e340>] ? dev_watchdog+0x0/0x1d0
Apr 22 23:49:35  kernel: [623449.988564]  [<c0131149>] ? run_timer_softirq+0xf9/0x1c0
Apr 22 23:49:35  kernel: [623449.988567]  [<c012d2c0>] ? __do_softirq+0x80/0x100
Apr 22 23:49:35  kernel: [623449.988570]  [<c012d36d>] ? do_softirq+0x2d/0x40
Apr 22 23:49:35  kernel: [623449.988574]  [<c0114d94>] ? smp_apic_timer_interrupt+0x54/0x90
Apr 22 23:49:35  kernel: [623449.988577]  [<c0103676>] ? apic_timer_interrupt+0x2a/0x30
Apr 22 23:49:35  kernel: [623449.988581]  [<c03f00d8>] ? klist_add_before+0x18/0x50
Apr 22 23:49:35  kernel: [623449.988585]  [<c0109dc2>] ? mwait_idle+0x42/0x60
Apr 22 23:49:35  kernel: [623449.988587]  [<c0101d55>] ? cpu_idle+0x35/0x60
Apr 22 23:49:35  kernel: [623449.988590] ---[ end trace 346a74434bf31555 ]---
Apr 22 23:49:35  kernel: [623449.988592] tg3: eth0: transmit timed out, resetting
Apr 22 23:49:35  kernel: [623449.988596] tg3: DEBUG: MAC_TX_STATUS[0000000f] MAC_RX_STATUS[00000008]
Apr 22 23:49:35  kernel: [623449.988601] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
Apr 22 23:49:35  kernel: [623450.089694] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
Apr 22 23:49:35  kernel: [623450.248154] tg3: eth0: Link is down.

Note You need to log in before you can comment on or make changes to this bug.