Bug 61441 - Network stop working ( (bnx2): transmit queue 0 timed out)
Summary: Network stop working ( (bnx2): transmit queue 0 timed out)
Status: CLOSED OBSOLETE
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-16 10:58 UTC by Javier Barroso
Modified: 2013-11-13 14:57 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32.61
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Javier Barroso 2013-09-16 10:58:36 UTC
Hello,

Do you have any hint to solve this problem ?

We have a NAS server (openfiler) with the latest 2.6.32 (.61) available kernel compiled. Every 1/2 days the server stop serving.

The next message is show in messages:

Uhhuh. NMI received for unknown reason b1 on CPU 0.
You have some hardware problem, likely on the PCI bus.
Dazed and confused, but trying to continue
------------[ cut here ]------------
WARNING: at /usr/src/linux-2.6.32.61/net/sched/sch_generic.c:261 dev_watchdog+0x247/0x260()
Hardware name: ProLiant BL460c G1
NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out
Modules linked in: autofs4 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs fuse 8021q garp stp llc bonding ipv6 dm_round_robin dm_multipath ext4 jbd2 dm_mirror dm_region_hash dm_log dm_snapshot dm_mod i5000_edac edac_core i5k_amb ipmi_si sd_mod iTCO_wdt iTCO_vendor_support bnx2 sg tg3 serio_raw hwmon ipmi_msghandler pcspkr hpilo crc_t10dif usb_storage qla2xxx scsi_transport_fc scsi_tgt shpchp cciss ext3 jbd mbcache radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.32.61-SAE #2
Call Trace:
 <IRQ>  [<ffffffff81067bbb>] warn_slowpath_common+0x7b/0xc0
 [<ffffffff81067c61>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff813f6c37>] dev_watchdog+0x247/0x260
 [<ffffffff8101a1b3>] ? native_sched_clock+0x13/0x80
 [<ffffffff81079d45>] ? internal_add_timer+0xb5/0x110
 [<ffffffff81079e24>] ? cascade+0x84/0xb0
 [<ffffffff8107a956>] run_timer_softirq+0x196/0x340
 [<ffffffff81097447>] ? ktime_get+0x57/0xd0
 [<ffffffff81070305>] __do_softirq+0xd5/0x200
 [<ffffffff81091006>] ? hrtimer_interrupt+0x146/0x260
 [<ffffffff8101424c>] call_softirq+0x1c/0x30
 [<ffffffff81015bf5>] do_softirq+0x65/0xa0
 [<ffffffff810700e5>] irq_exit+0x85/0x90
 [<ffffffff8149c771>] smp_apic_timer_interrupt+0x71/0x9c
 [<ffffffff81013c13>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8101b94f>] ? mwait_idle+0x6f/0xd0
 [<ffffffff8149a53a>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff81011e66>] ? cpu_idle+0xb6/0x110
 [<ffffffff8148e7e7>] ? start_secondary+0x1fc/0x23f
---[ end trace 177d1288aad2a52a ]---
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!
bnx2: Chip reset did not complete
bnx2: eth0: BUG! Tx ring full when queue awake!

No log is found at:

# ipmitool  sel
SEL Information
Version          : 1.5 (v1.5, v2 compliant)
Entries          : 0
Free Space       : 1024 bytes
Percent Used     : 0%
Last Add Time    : Not Available
Last Del Time    : 08/02/2007 14:23:47
Overflow         : false
Supported Cmds   : None
pear in ipmi:

Can we configure bnx2 module to mitigate that BUG ?

Thank you very much
Comment 1 Javier Barroso 2013-09-17 09:11:13 UTC
Hello,

For reference, the same bug was found in earlier 2.6.32 kernels from openfiler. See https://forums.openfiler.com/index.php?/topic/6566-bnx2-driver-issues-bl/page__gopid__27342#entry27342

The network interfaces are:
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
07:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)

They are working with bonding

Thank you
Comment 2 Javier Barroso 2013-09-17 10:41:38 UTC
Hello,

I has recollected , what I think it is interesting, data from sar.

Please note that though it is not the same kernel which is reported here, it is the same issue. I had that crash located in the time. If you think it is better to post the output with the last kernel and with the last crash , I can replay those commands.

# sar -n NFS  -I SUM   -n EDEV -b -w -f  /var/log/sa/sa12 -s 19:00:00 -e 19:45:00 ; \
sar -n NFSD -B -n SOCK -f /var/log/sa/sa12 -s 18:00:00 -e 19:40:00 ; \
echo "07:51:11 PM       LINUX RESTART"

Linux 2.6.32-71.18.1.el6-0.20.smp.gcc4.1.x86_64 (rhnas01) 	09/12/2013

07:00:01 PM   cswch/s
07:10:01 PM    816.95
07:20:01 PM    124.78
07:30:01 PM 151901.63 ************************* [1] 
07:40:01 PM   5386.04
Average:     41660.65

07:00:01 PM      INTR    intr/s
07:10:01 PM       sum   1182.60
07:20:01 PM       sum   2099.52
07:30:01 PM       sum   5587.73
07:40:01 PM       sum   6378.32 ************************* [2] 
Average:          sum   3651.24

07:00:01 PM       tps      rtps      wtps   bread/s   bwrtn/s
07:10:01 PM      6.91      0.26      6.65      2.10     73.17
07:20:01 PM 7154205.83 7154303.81 7154245.95 7153372.94 7153219.18 ************************* [3] 
07:30:01 PM   1989.93      0.02   1989.90      0.19  15965.17
07:40:01 PM   2643.06      0.03   2643.04      0.52  21146.24
Average:      1030.48 1899971.86   1041.05 1899725.28   8255.77

07:00:01 PM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s  txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
07:10:01 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:10:01 PM      eth2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:10:01 PM      eth3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:10:01 PM      eth0      0.00      0.00      0.00 7148628.18      0.00      0.00      0.00      0.00      0.00
07:10:01 PM      eth1      0.00      0.00      0.00 7148628.18      0.00      0.00      0.00      0.00      0.00
07:10:01 PM     bond0      0.00      0.00      0.00 14297256.35      0.00      0.00      0.00      0.00      0.00 ************************* [4]
07:10:01 PM bond0.200      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:10:01 PM bond0.710      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:10:01 PM bond0.711      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM      eth2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM      eth3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM      eth1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM     bond0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM bond0.200      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM bond0.710      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM bond0.711      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM      eth2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM      eth3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM      eth1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM     bond0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM bond0.200      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM bond0.710      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM bond0.711      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM      eth2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM      eth3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM      eth1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM     bond0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM bond0.200      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM bond0.710      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM bond0.711      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:           lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth0      0.00      0.00      0.00 1899982.44      0.00      0.00      0.00      0.00      0.00
Average:         eth1      0.00      0.00      0.00 1899982.44      0.00      0.00      0.00      0.00      0.00
Average:        bond0      0.00      0.00      0.00 3799964.87      0.00      0.00      0.00      0.00      0.00
Average:    bond0.200      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:    bond0.710      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:    bond0.711      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

07:00:01 PM    call/s retrans/s    read/s   write/s  access/s  getatt/s
07:10:01 PM      0.00      0.00      0.00      0.00      0.00      0.00
07:20:01 PM      0.00      0.00      0.00      0.00      0.00      0.00
07:30:01 PM      0.00      0.00      0.00      0.00      0.00      0.00
07:40:01 PM      0.00      0.00      0.00      0.00      0.00      0.00
Average:         0.00      0.00      0.00      0.00      0.00      0.00
Linux 2.6.32-71.18.1.el6-0.20.smp.gcc4.1.x86_64 (rhnas01) 	09/12/2013

06:00:01 PM   scall/s badcall/s  packet/s     udp/s     tcp/s     hit/s    miss/s   sread/s  swrite/s saccess/s sgetatt/s
06:10:01 PM    247.40      0.00    247.32      0.00    247.31      0.00      0.29      0.00      0.28    123.50    123.03
06:20:01 PM    247.36      0.00    247.30      0.00    247.30      0.00      0.27      0.00      0.26    123.55    122.98
06:30:01 PM    249.72      0.00    249.67      0.00    249.65      0.00      0.24      0.93      0.24    124.32    123.67
06:40:01 PM    246.90      0.00    246.87      0.00    246.86      0.00      0.24      0.00      0.24    123.43    122.68
06:50:01 PM    243.95      0.00    243.83      0.00    243.84      0.00      0.24      0.02      0.22    121.82    121.32
07:00:01 PM    248.13      0.00    248.11      0.00    248.11      0.00      0.26      0.00      0.26    123.87    123.46
07:10:01 PM    205.87      0.00    205.84      0.00    205.82      0.00      0.18      0.00      0.18    103.02    102.24
07:20:01 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00 ************************* [5] 
07:30:01 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       187.74      0.00    187.70      0.00    187.69      0.00      0.19      0.11      0.19     93.74     93.28

06:00:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s
06:10:01 PM      1.71     19.59     41.19      0.00
06:20:01 PM      7.33     15.52     13.88      0.00
06:30:01 PM      8.77     17.55     15.22      0.00
06:40:01 PM      5.29     16.55     12.87      0.00
06:50:01 PM      9.90     16.92     16.35      0.00
07:00:01 PM      1.20     20.02     20.90      0.00
07:10:01 PM      1.10     18.30     48.91      0.00
07:20:01 PM      0.05      7.08     27.77      0.00
07:30:01 PM      0.05   3991.31     17.66      0.00 ************************* [6] 
Average:         3.93    457.66     23.86      0.00

06:00:01 PM    totsck    tcpsck    udpsck    rawsck   ip-frag
06:10:01 PM       267        17        26         0         0
06:20:01 PM       267        17        26         0         0
06:30:01 PM       267        17        26         0         0
06:40:01 PM       267        17        26         0         0
06:50:01 PM       267        17        26         0         0
07:00:01 PM       267        17        26         0         0
07:10:01 PM       269        17        26         0         0
07:20:01 PM       261         9        26         0         0 ************************* [7] 
07:30:01 PM       255         9        26         0         0
Average:          265        15        26         0         0
07:51:11 PM       LINUX RESTART

Chronologically, I can see from sar output:

At 07:10 PM a huge number of packages by seconds (14297256.35)  are dropped at [4]
At 07:20 PM a huge number of transfers by seconds (7154205) are issue to physical devices [3], and NFS stopped it works [5],[7]
At 07:30 PM a huge number of context switches (151901)  were created [1] and a big number (3991) of page were out to the disk [6]
At 07:40 PM a peak of interrupts by seconds (6378) happens [6]

Maybe somebody with more experience than me can tell what happened at that moment. And if we can configure kernel in some way.

Thank you very mcuh
Comment 3 Javier Barroso 2013-09-18 06:11:50 UTC
We had change disks of that server to another (most modern Blade640 G6) blade. So surely I cannot test any tip if the issue don't happen in the new environment. So if you feel closing this bug is a must, I will understand.

Thank you
Comment 4 Alan 2013-11-13 14:57:36 UTC
2.6.32 is obsolete so yes

Note You need to log in before you can comment on or make changes to this bug.