Bug 56631 - niu Sun Neptune 10G fiber card - "transmit timed out"
Summary: niu Sun Neptune 10G fiber card - "transmit timed out"
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-15 13:25 UTC by arb
Modified: 2016-02-18 12:27 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.9.0
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description arb 2013-04-15 13:25:32 UTC
[Ref. bug 10801 still not solved]

The system will hang with the message
"niu: xxx: eth2: Transmit timed out, resetting"
It will then remain hung with messages output every 5 seconds.

This has also been observed (but not every time):
WARNING: at sch_generic:255 dev_watchdog
NETDEV WATCHDOG: eth2 (niu): transmit queue xx timed out

Has been seen to occur in kernels 3.2, 3.5 and 3.9.
Hardware: "Sun Fire X2250" with "Oracle/SUN Multithreaded 10-Gigabit Ethernet Network Controller (rev 01)"
Software: Ubuntu Precise 12.04.2 running as a NFS client
Comment 1 arb 2013-11-19 11:05:36 UTC
With additional messages turned on via ethtool -s eth2 msglvl $((0x7fff))

When it times out we see these messages:

Aug 22 14:18:44 hostname kernel:
[3061696.754461] niu: niu_interrupt() ldg[ffff8807141d0e60](3) v0[1] v1[0] v2[0]
[3061696.754467] niu 0000:09:00.0: eth2: niu_rxchan_intr() stat[800000010001]
[3061696.754471] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000000000000001]
[3061696.754476] niu 0000:09:00.0: eth2: niu_rx_work(chan[0]), stat[0] qlen=1
[3061696.816008] niu 0000:09:00.0: eth2: Transmit timed out, resetting
[3061696.816684] niu 0000:09:00.0: eth2: Disable interrupts
[3061696.816704] niu 0000:09:00.0: eth2: Disable RX MAC
[3061696.816707] niu 0000:09:00.0: eth2: Disable IPP
[3061696.816713] niu 0000:09:00.0: eth2: Stop TX channels
[3061696.816736] niu 0000:09:00.0: eth2: Stop RX channels
[3061696.816747] niu 0000:09:00.0: eth2: Reset TX channels
[3061696.816772] niu 0000:09:00.0: eth2: Reset RX channels
[3061696.817410] niu 0000:09:00.0: eth2: Initialize TXC
[3061696.817413] niu 0000:09:00.0: eth2: Initialize TX channels
[3061696.817454] niu 0000:09:00.0: eth2: Initialize RX channels
[3061696.817501] niu 0000:09:00.0: eth2: Initialize classifier
[3061696.820001] niu 0000:09:00.0: eth2: Initialize ZCP
[3061696.820001] niu 0000:09:00.0: eth2: Initialize IPP
[3061696.820001] niu 0000:09:00.0: eth2: Initialize MAC
[3061696.838493] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0]
[3061696.838500] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[e00020000c000]
[3061696.838502] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000]
[3061696.838505] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[14] cons[0]


When it completely locks up we see these messages:

Aug 26 14:42:49 hostname kernel:
[3408740.815983] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0]
[3408740.815987] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000]
[3408740.815989] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000]
[3408740.815990] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119]
[3408740.815993] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0]
[3408740.815997] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000]
[3408740.816006] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000]
[3408740.816007] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119]
[3408740.816010] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0]
[3408740.816015] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000]
[3408740.816016] niu 0000:09:00.0: eth2: Transmit timed out, resetting
[3408740.816017] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000]
[3408740.816019] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119]
[3408740.816022] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0]
[3408740.816026] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000]
[3408740.816028] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000]
[3408740.816029] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119]
[3408740.816032] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0]
[3408740.816036] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000]
[3408740.816038] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000]
[3408740.816040] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119]
[3408740.816042] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0]
[3408740.820004] [sched_delayed] sched: RT throttling activated
[3408740.824021] niu 0000:09:00.0: eth2: Disable interrupts
[3408740.824044] niu 0000:09:00.0: eth2: Disable RX MAC
[3408740.824048] niu 0000:09:00.0: eth2: Disable IPP
[3408740.824054] niu 0000:09:00.0: eth2: Stop TX channels
[3408740.824641] niu 0000:09:00.0: eth2: Stop RX channels
[3408740.824652] niu 0000:09:00.0: eth2: Reset TX channels
[3408740.825212] niu 0000:09:00.0: eth2: Reset RX channels
[3408740.825999] niu 0000:09:00.0: eth2: Initialize TXC
[3408740.826002] niu 0000:09:00.0: eth2: Initialize TX channels

I wonder if it's locking up in or after the "Initialize TX channels" phase?
Comment 2 Beary White 2014-11-08 01:05:08 UTC
We've seen this issue appear twice on a Sun Fire X4440 server in the past 6 months.  The last time it occurred we moved the fiber to the other 10G port and updated kernel.  Seems this bug still exists, however.


Nov  7 23:07:43 hostname kernel: ------------[ cut here ]------------
Nov  7 23:07:43 hostname kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Tainted: P           ---------------   )
Nov  7 23:07:43 hostname kernel: Hardware name: Sun Fire X4440
Nov  7 23:07:43 hostname kernel: NETDEV WATCHDOG: eth4 (niu): transmit queue 0 timed out
Nov  7 23:07:43 hostname kernel: Modules linked in: cvfs(P)(U) cpufreq_ondemand powernow_k8 freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ipv6 tcp_scalable microcode k10temp amd64_edac_mod edac_core edac_mce_amd sg niu forcedeth i2c_nforce2 i2c_core ext3 jbd mbcache sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas pata_acpi ata_generic sata_nv dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Nov  7 23:07:43 hostname kernel: Pid: 0, comm: swapper Tainted: P           ---------------    2.6.32-358.23.2.el6.x86_64 #1
Nov  7 23:07:43 hostname kernel: Call Trace:
Nov  7 23:07:43 hostname kernel: <IRQ>  [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
Nov  7 23:07:43 hostname kernel: [<ffffffff8106e4d6>] ? warn_slowpath_fmt+0x46/0x50
Nov  7 23:07:43 hostname kernel: [<ffffffff81467ffd>] ? dev_watchdog+0x26d/0x280
Nov  7 23:07:43 hostname kernel: [<ffffffff81090e8d>] ? insert_work+0x6d/0xb0
Nov  7 23:07:43 hostname kernel: [<ffffffff81467d90>] ? dev_watchdog+0x0/0x280
Nov  7 23:07:43 hostname kernel: [<ffffffff81081937>] ? run_timer_softirq+0x197/0x340
Nov  7 23:07:43 hostname kernel: [<ffffffff810a8060>] ? tick_sched_timer+0x0/0xc0
Nov  7 23:07:43 hostname kernel: [<ffffffff8102ea2d>] ? lapic_next_event+0x1d/0x30
Nov  7 23:07:43 hostname kernel: [<ffffffff810770b1>] ? __do_softirq+0xc1/0x1e0
Nov  7 23:07:43 hostname kernel: [<ffffffff8109b87b>] ? hrtimer_interrupt+0x14b/0x260
Nov  7 23:07:43 hostname kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
Nov  7 23:07:43 hostname kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
Nov  7 23:07:43 hostname kernel: [<ffffffff81076e95>] ? irq_exit+0x85/0x90
Nov  7 23:07:43 hostname kernel: [<ffffffff81517860>] ? smp_apic_timer_interrupt+0x70/0x9b
Nov  7 23:07:43 hostname kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
Nov  7 23:07:43 hostname kernel: <EOI>  [<ffffffff8103b92b>] ? native_safe_halt+0xb/0x10
Nov  7 23:07:43 hostname kernel: [<ffffffff81014a5d>] ? default_idle+0x4d/0xb0
Nov  7 23:07:43 hostname kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Nov  7 23:07:43 hostname kernel: [<ffffffff81507600>] ? start_secondary+0x2ac/0x2ef
Nov  7 23:07:43 hostname kernel: ---[ end trace b1c05f26a68821c6 ]---
Nov  7 23:07:43 hostname kernel: niu 0000:03:00.0: niu: eth4: Transmit timed out, resetting

Note You need to log in before you can comment on or make changes to this bug.