Bug 56631
Summary: | niu Sun Neptune 10G fiber card - "transmit timed out" | ||
---|---|---|---|
Product: | Drivers | Reporter: | arb |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | NEW --- | ||
Severity: | blocking | CC: | alan, szg00000, thomas.herritt |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.9.0 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
arb
2013-04-15 13:25:32 UTC
With additional messages turned on via ethtool -s eth2 msglvl $((0x7fff)) When it times out we see these messages: Aug 22 14:18:44 hostname kernel: [3061696.754461] niu: niu_interrupt() ldg[ffff8807141d0e60](3) v0[1] v1[0] v2[0] [3061696.754467] niu 0000:09:00.0: eth2: niu_rxchan_intr() stat[800000010001] [3061696.754471] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000000000000001] [3061696.754476] niu 0000:09:00.0: eth2: niu_rx_work(chan[0]), stat[0] qlen=1 [3061696.816008] niu 0000:09:00.0: eth2: Transmit timed out, resetting [3061696.816684] niu 0000:09:00.0: eth2: Disable interrupts [3061696.816704] niu 0000:09:00.0: eth2: Disable RX MAC [3061696.816707] niu 0000:09:00.0: eth2: Disable IPP [3061696.816713] niu 0000:09:00.0: eth2: Stop TX channels [3061696.816736] niu 0000:09:00.0: eth2: Stop RX channels [3061696.816747] niu 0000:09:00.0: eth2: Reset TX channels [3061696.816772] niu 0000:09:00.0: eth2: Reset RX channels [3061696.817410] niu 0000:09:00.0: eth2: Initialize TXC [3061696.817413] niu 0000:09:00.0: eth2: Initialize TX channels [3061696.817454] niu 0000:09:00.0: eth2: Initialize RX channels [3061696.817501] niu 0000:09:00.0: eth2: Initialize classifier [3061696.820001] niu 0000:09:00.0: eth2: Initialize ZCP [3061696.820001] niu 0000:09:00.0: eth2: Initialize IPP [3061696.820001] niu 0000:09:00.0: eth2: Initialize MAC [3061696.838493] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0] [3061696.838500] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[e00020000c000] [3061696.838502] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000] [3061696.838505] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[14] cons[0] When it completely locks up we see these messages: Aug 26 14:42:49 hostname kernel: [3408740.815983] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0] [3408740.815987] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000] [3408740.815989] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000] [3408740.815990] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119] [3408740.815993] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0] [3408740.815997] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000] [3408740.816006] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000] [3408740.816007] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119] [3408740.816010] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0] [3408740.816015] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000] [3408740.816016] niu 0000:09:00.0: eth2: Transmit timed out, resetting [3408740.816017] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000] [3408740.816019] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119] [3408740.816022] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0] [3408740.816026] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000] [3408740.816028] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000] [3408740.816029] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119] [3408740.816032] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0] [3408740.816036] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000] [3408740.816038] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000] [3408740.816040] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119] [3408740.816042] niu: niu_interrupt() ldg[ffff8807141d16d0](18) v0[8000000000] v1[0] v2[0] [3408740.820004] [sched_delayed] sched: RT throttling activated [3408740.824021] niu 0000:09:00.0: eth2: Disable interrupts [3408740.824044] niu 0000:09:00.0: eth2: Disable RX MAC [3408740.824048] niu 0000:09:00.0: eth2: Disable IPP [3408740.824054] niu 0000:09:00.0: eth2: Stop TX channels [3408740.824641] niu 0000:09:00.0: eth2: Stop RX channels [3408740.824652] niu 0000:09:00.0: eth2: Reset TX channels [3408740.825212] niu 0000:09:00.0: eth2: Reset RX channels [3408740.825999] niu 0000:09:00.0: eth2: Initialize TXC [3408740.826002] niu 0000:09:00.0: eth2: Initialize TX channels I wonder if it's locking up in or after the "Initialize TX channels" phase? We've seen this issue appear twice on a Sun Fire X4440 server in the past 6 months. The last time it occurred we moved the fiber to the other 10G port and updated kernel. Seems this bug still exists, however. Nov 7 23:07:43 hostname kernel: ------------[ cut here ]------------ Nov 7 23:07:43 hostname kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Tainted: P --------------- ) Nov 7 23:07:43 hostname kernel: Hardware name: Sun Fire X4440 Nov 7 23:07:43 hostname kernel: NETDEV WATCHDOG: eth4 (niu): transmit queue 0 timed out Nov 7 23:07:43 hostname kernel: Modules linked in: cvfs(P)(U) cpufreq_ondemand powernow_k8 freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ipv6 tcp_scalable microcode k10temp amd64_edac_mod edac_core edac_mce_amd sg niu forcedeth i2c_nforce2 i2c_core ext3 jbd mbcache sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas pata_acpi ata_generic sata_nv dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Nov 7 23:07:43 hostname kernel: Pid: 0, comm: swapper Tainted: P --------------- 2.6.32-358.23.2.el6.x86_64 #1 Nov 7 23:07:43 hostname kernel: Call Trace: Nov 7 23:07:43 hostname kernel: <IRQ> [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0 Nov 7 23:07:43 hostname kernel: [<ffffffff8106e4d6>] ? warn_slowpath_fmt+0x46/0x50 Nov 7 23:07:43 hostname kernel: [<ffffffff81467ffd>] ? dev_watchdog+0x26d/0x280 Nov 7 23:07:43 hostname kernel: [<ffffffff81090e8d>] ? insert_work+0x6d/0xb0 Nov 7 23:07:43 hostname kernel: [<ffffffff81467d90>] ? dev_watchdog+0x0/0x280 Nov 7 23:07:43 hostname kernel: [<ffffffff81081937>] ? run_timer_softirq+0x197/0x340 Nov 7 23:07:43 hostname kernel: [<ffffffff810a8060>] ? tick_sched_timer+0x0/0xc0 Nov 7 23:07:43 hostname kernel: [<ffffffff8102ea2d>] ? lapic_next_event+0x1d/0x30 Nov 7 23:07:43 hostname kernel: [<ffffffff810770b1>] ? __do_softirq+0xc1/0x1e0 Nov 7 23:07:43 hostname kernel: [<ffffffff8109b87b>] ? hrtimer_interrupt+0x14b/0x260 Nov 7 23:07:43 hostname kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 Nov 7 23:07:43 hostname kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0 Nov 7 23:07:43 hostname kernel: [<ffffffff81076e95>] ? irq_exit+0x85/0x90 Nov 7 23:07:43 hostname kernel: [<ffffffff81517860>] ? smp_apic_timer_interrupt+0x70/0x9b Nov 7 23:07:43 hostname kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20 Nov 7 23:07:43 hostname kernel: <EOI> [<ffffffff8103b92b>] ? native_safe_halt+0xb/0x10 Nov 7 23:07:43 hostname kernel: [<ffffffff81014a5d>] ? default_idle+0x4d/0xb0 Nov 7 23:07:43 hostname kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Nov 7 23:07:43 hostname kernel: [<ffffffff81507600>] ? start_secondary+0x2ac/0x2ef Nov 7 23:07:43 hostname kernel: ---[ end trace b1c05f26a68821c6 ]--- Nov 7 23:07:43 hostname kernel: niu 0000:03:00.0: niu: eth4: Transmit timed out, resetting |