Distribution: Kernel 2.6.20 Problem Description: In natsemi.c, a Linux PCI Ethernet driver for the NatSemi DP8381x series, I found that two fields in network device statistics structure, namely tx_errors and tx_fifo_errors, are not updated consistantly. In struct net_device_stats, tx_errors indicates packet transmit problems, and tx_fifo_errors indicates transmitter's fifo overrun. tx_fifo_errors is a sub-category of tx_errors, and whenever tx_fifo_errors is incremented, tx_errors should also be incremented accordingly. This is conformed in most places except one. In drivers/net/natsemi.c, in function static void netdev_error(struct net_device *dev, int intr_status), the suspicious code segment is as follows: if (intr_status & RxStatusFIFOOver) { if (netif_msg_rx_err(np) && netif_msg_intr(np)) { printk(KERN_NOTICE "%s: Rx status FIFO overrun\n", dev->name); } np->stats.rx_fifo_errors++; } /* Hmmmmm, it's not clear how to recover from PCI faults. */ if (intr_status & IntrPCIErr) { printk(KERN_NOTICE "%s: PCI error %#08x\n", dev->name, intr_status & IntrPCIErr); np->stats.tx_fifo_errors++; np->stats.rx_fifo_errors++; } Here, tx_fifo_errors is incremented, but there's no tx_errors updates in this function overall. This might cause inaccurate transmit error statistics. Steps to reproduce: I found this bug by using a code-checking tool.
I don't think there is a general rule that tx_fifo_errors is a subset of tx_errors. Different drivers handle this differently, and while some increment tx_errors for every kind of transmit error, some will just increment say tx_fifo_errors and leave tx_errors for unspecified problems.
In the case of PCI errors it's not particularly clear why the FIFO error counts are being incremented in the first place.
Any updates on this discussion? Needs to be discussed on netdev maybe? Thanks.
As far as I remember the discussion on netdev about this sort of issue has always suggested that as Olaf says there's no particular standard. In any case, I expect that for most FIFO errors there will also be an error reported via the descriptor which would increment the overall error counter - if that happens incrementing it again here would be redundant.
FWIW a change implementing the requested behaviour was written by Andrew Morton and merged some time ago.