Hello, Do you have any hint to solve this problem ? We have a NAS server (openfiler) with the latest 2.6.32 (.61) available kernel compiled. Every 1/2 days the server stop serving. The next message is show in messages: Uhhuh. NMI received for unknown reason b1 on CPU 0. You have some hardware problem, likely on the PCI bus. Dazed and confused, but trying to continue ------------[ cut here ]------------ WARNING: at /usr/src/linux-2.6.32.61/net/sched/sch_generic.c:261 dev_watchdog+0x247/0x260() Hardware name: ProLiant BL460c G1 NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out Modules linked in: autofs4 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs fuse 8021q garp stp llc bonding ipv6 dm_round_robin dm_multipath ext4 jbd2 dm_mirror dm_region_hash dm_log dm_snapshot dm_mod i5000_edac edac_core i5k_amb ipmi_si sd_mod iTCO_wdt iTCO_vendor_support bnx2 sg tg3 serio_raw hwmon ipmi_msghandler pcspkr hpilo crc_t10dif usb_storage qla2xxx scsi_transport_fc scsi_tgt shpchp cciss ext3 jbd mbcache radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: microcode] Pid: 0, comm: swapper Not tainted 2.6.32.61-SAE #2 Call Trace: <IRQ> [<ffffffff81067bbb>] warn_slowpath_common+0x7b/0xc0 [<ffffffff81067c61>] warn_slowpath_fmt+0x41/0x50 [<ffffffff813f6c37>] dev_watchdog+0x247/0x260 [<ffffffff8101a1b3>] ? native_sched_clock+0x13/0x80 [<ffffffff81079d45>] ? internal_add_timer+0xb5/0x110 [<ffffffff81079e24>] ? cascade+0x84/0xb0 [<ffffffff8107a956>] run_timer_softirq+0x196/0x340 [<ffffffff81097447>] ? ktime_get+0x57/0xd0 [<ffffffff81070305>] __do_softirq+0xd5/0x200 [<ffffffff81091006>] ? hrtimer_interrupt+0x146/0x260 [<ffffffff8101424c>] call_softirq+0x1c/0x30 [<ffffffff81015bf5>] do_softirq+0x65/0xa0 [<ffffffff810700e5>] irq_exit+0x85/0x90 [<ffffffff8149c771>] smp_apic_timer_interrupt+0x71/0x9c [<ffffffff81013c13>] apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff8101b94f>] ? mwait_idle+0x6f/0xd0 [<ffffffff8149a53a>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff81011e66>] ? cpu_idle+0xb6/0x110 [<ffffffff8148e7e7>] ? start_secondary+0x1fc/0x23f ---[ end trace 177d1288aad2a52a ]--- bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! bnx2: Chip reset did not complete bnx2: eth0: BUG! Tx ring full when queue awake! No log is found at: # ipmitool sel SEL Information Version : 1.5 (v1.5, v2 compliant) Entries : 0 Free Space : 1024 bytes Percent Used : 0% Last Add Time : Not Available Last Del Time : 08/02/2007 14:23:47 Overflow : false Supported Cmds : None pear in ipmi: Can we configure bnx2 module to mitigate that BUG ? Thank you very much
Hello, For reference, the same bug was found in earlier 2.6.32 kernels from openfiler. See https://forums.openfiler.com/index.php?/topic/6566-bnx2-driver-issues-bl/page__gopid__27342#entry27342 The network interfaces are: 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12) 07:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12) They are working with bonding Thank you
Hello, I has recollected , what I think it is interesting, data from sar. Please note that though it is not the same kernel which is reported here, it is the same issue. I had that crash located in the time. If you think it is better to post the output with the last kernel and with the last crash , I can replay those commands. # sar -n NFS -I SUM -n EDEV -b -w -f /var/log/sa/sa12 -s 19:00:00 -e 19:45:00 ; \ sar -n NFSD -B -n SOCK -f /var/log/sa/sa12 -s 18:00:00 -e 19:40:00 ; \ echo "07:51:11 PM LINUX RESTART" Linux 2.6.32-71.18.1.el6-0.20.smp.gcc4.1.x86_64 (rhnas01) 09/12/2013 07:00:01 PM cswch/s 07:10:01 PM 816.95 07:20:01 PM 124.78 07:30:01 PM 151901.63 ************************* [1] 07:40:01 PM 5386.04 Average: 41660.65 07:00:01 PM INTR intr/s 07:10:01 PM sum 1182.60 07:20:01 PM sum 2099.52 07:30:01 PM sum 5587.73 07:40:01 PM sum 6378.32 ************************* [2] Average: sum 3651.24 07:00:01 PM tps rtps wtps bread/s bwrtn/s 07:10:01 PM 6.91 0.26 6.65 2.10 73.17 07:20:01 PM 7154205.83 7154303.81 7154245.95 7153372.94 7153219.18 ************************* [3] 07:30:01 PM 1989.93 0.02 1989.90 0.19 15965.17 07:40:01 PM 2643.06 0.03 2643.04 0.52 21146.24 Average: 1030.48 1899971.86 1041.05 1899725.28 8255.77 07:00:01 PM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s 07:10:01 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:10:01 PM eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:10:01 PM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:10:01 PM eth0 0.00 0.00 0.00 7148628.18 0.00 0.00 0.00 0.00 0.00 07:10:01 PM eth1 0.00 0.00 0.00 7148628.18 0.00 0.00 0.00 0.00 0.00 07:10:01 PM bond0 0.00 0.00 0.00 14297256.35 0.00 0.00 0.00 0.00 0.00 ************************* [4] 07:10:01 PM bond0.200 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:10:01 PM bond0.710 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:10:01 PM bond0.711 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM bond0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM bond0.200 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM bond0.710 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM bond0.711 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM bond0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM bond0.200 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM bond0.710 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM bond0.711 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM bond0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM bond0.200 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM bond0.710 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM bond0.711 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: eth0 0.00 0.00 0.00 1899982.44 0.00 0.00 0.00 0.00 0.00 Average: eth1 0.00 0.00 0.00 1899982.44 0.00 0.00 0.00 0.00 0.00 Average: bond0 0.00 0.00 0.00 3799964.87 0.00 0.00 0.00 0.00 0.00 Average: bond0.200 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: bond0.710 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: bond0.711 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:00:01 PM call/s retrans/s read/s write/s access/s getatt/s 07:10:01 PM 0.00 0.00 0.00 0.00 0.00 0.00 07:20:01 PM 0.00 0.00 0.00 0.00 0.00 0.00 07:30:01 PM 0.00 0.00 0.00 0.00 0.00 0.00 07:40:01 PM 0.00 0.00 0.00 0.00 0.00 0.00 Average: 0.00 0.00 0.00 0.00 0.00 0.00 Linux 2.6.32-71.18.1.el6-0.20.smp.gcc4.1.x86_64 (rhnas01) 09/12/2013 06:00:01 PM scall/s badcall/s packet/s udp/s tcp/s hit/s miss/s sread/s swrite/s saccess/s sgetatt/s 06:10:01 PM 247.40 0.00 247.32 0.00 247.31 0.00 0.29 0.00 0.28 123.50 123.03 06:20:01 PM 247.36 0.00 247.30 0.00 247.30 0.00 0.27 0.00 0.26 123.55 122.98 06:30:01 PM 249.72 0.00 249.67 0.00 249.65 0.00 0.24 0.93 0.24 124.32 123.67 06:40:01 PM 246.90 0.00 246.87 0.00 246.86 0.00 0.24 0.00 0.24 123.43 122.68 06:50:01 PM 243.95 0.00 243.83 0.00 243.84 0.00 0.24 0.02 0.22 121.82 121.32 07:00:01 PM 248.13 0.00 248.11 0.00 248.11 0.00 0.26 0.00 0.26 123.87 123.46 07:10:01 PM 205.87 0.00 205.84 0.00 205.82 0.00 0.18 0.00 0.18 103.02 102.24 07:20:01 PM 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ************************* [5] 07:30:01 PM 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: 187.74 0.00 187.70 0.00 187.69 0.00 0.19 0.11 0.19 93.74 93.28 06:00:01 PM pgpgin/s pgpgout/s fault/s majflt/s 06:10:01 PM 1.71 19.59 41.19 0.00 06:20:01 PM 7.33 15.52 13.88 0.00 06:30:01 PM 8.77 17.55 15.22 0.00 06:40:01 PM 5.29 16.55 12.87 0.00 06:50:01 PM 9.90 16.92 16.35 0.00 07:00:01 PM 1.20 20.02 20.90 0.00 07:10:01 PM 1.10 18.30 48.91 0.00 07:20:01 PM 0.05 7.08 27.77 0.00 07:30:01 PM 0.05 3991.31 17.66 0.00 ************************* [6] Average: 3.93 457.66 23.86 0.00 06:00:01 PM totsck tcpsck udpsck rawsck ip-frag 06:10:01 PM 267 17 26 0 0 06:20:01 PM 267 17 26 0 0 06:30:01 PM 267 17 26 0 0 06:40:01 PM 267 17 26 0 0 06:50:01 PM 267 17 26 0 0 07:00:01 PM 267 17 26 0 0 07:10:01 PM 269 17 26 0 0 07:20:01 PM 261 9 26 0 0 ************************* [7] 07:30:01 PM 255 9 26 0 0 Average: 265 15 26 0 0 07:51:11 PM LINUX RESTART Chronologically, I can see from sar output: At 07:10 PM a huge number of packages by seconds (14297256.35) are dropped at [4] At 07:20 PM a huge number of transfers by seconds (7154205) are issue to physical devices [3], and NFS stopped it works [5],[7] At 07:30 PM a huge number of context switches (151901) were created [1] and a big number (3991) of page were out to the disk [6] At 07:40 PM a peak of interrupts by seconds (6378) happens [6] Maybe somebody with more experience than me can tell what happened at that moment. And if we can configure kernel in some way. Thank you very mcuh
We had change disks of that server to another (most modern Blade640 G6) blade. So surely I cannot test any tip if the issue don't happen in the new environment. So if you feel closing this bug is a must, I will understand. Thank you
2.6.32 is obsolete so yes