Most recent kernel where this bug did not occur: not known Distribution: Debian Hardware Environment: two PCs as UDP client/server hosts Software Environment: UDP socket applications Problem Description: Incoming UDP datagrams are counted erratic (InError) and as `delivered' (InDatagram) if their checksum was invalidated. I sent 50,000 UDP datagrams, 3122 deliberately corrupted in transit to produce checksum failures. The receiving host stated 48484 InDatagrams, 2 NoPorts, and 3119 InErrors. The sum of these is 51605; the difference of 1605 is identical with the number of datagrams erroneously counted twice. The InDatagram count is wrong: in reality, only 48484 - 1605 = 46879 datagrams were delivered (confirmed by receiving application). Analysis: UDP datagrams are passed on via ip_local_deliver_finish() to udp_rcv(), which calls udp_queue_rcv_skb(). At this point, the checksum of the datagram is verified only if the sk_filter is set. Next udp_queue_rcv_skb() increments the InDatagrams (UDP_MIB_INDATAGRAMS) variable. Once udp_recvmsg() is called as a handler for incoming UDP datagrams, the checksum is verified for the first time (unless sk_filter was set); if the checksum fails, the `goto csum_copy_err' leads to forcibly removing the datagram and incrementing InErrors (UDP_MIB_INERRORS). Now the datagram has been counted twice: once as InDatagram and once as InErrors. This is \wrong\: RFC 2013 defines InDatagrams as counter for delivered datagrams; these datagrams are counted but never delivered. The consequence is that InDatagrams + InErrors + NoPorts exceeds the count of actual datagrams (the difference is precisely the number of datagrams which have thus been counted twice). Steps to reproduce: Send UDP datagrams with checksums enabled, use middlebox which corrupts part of the traffic (e.g. bit errors / NetEm) and watch UDP counters in /proc/net/snmp. The sum of InErrors, NoPorts and InDatagrams exceeds the real number of sent datagrams. You can add a counter or printk message underneath csum_copy_err:, the number of increments equals the times datagrams are counted twice.
Created attachment 8270 [details] Patch to fix doubly-counted udp.c InDatagrams Fix: Move the `UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS)' statement from udp_queue_rcv_skb to udp_recvmsg. Now InDatagrams only counts those datagrams which were really delivered (as per RFC 2013). This lead to correct counting behaviour (verified in 4 sets of 9x 50000 UDP datagrams) on both rc and stable kernel.
Created attachment 8289 [details] Fixes doubly-counted datagrams which failed UDP checksum in udp_recvmsg(). This patch improves on the previous one: the previous one would break applications such as NFS which do not use udp_recvmsg(). This solution was discussed on the mailing list and is based on decrementing InDatagrams when an error occurred.
sorry, i accidently changed this bug...
putting bug back to previous state
Comment on attachment 8289 [details] Fixes doubly-counted datagrams which failed UDP checksum in udp_recvmsg(). This one does not take udp_poll() into account (where the same problem as in udp_recvmsg() crops up).
Created attachment 9125 [details] Fixes doubly-counted datagrams which failed UDP checksum in udp_recvmsg() AND in udp_poll() This is an update, the previous patch did not take udp_poll() into consideration. A cleaner solution would be to instead shift the increment of MIB_DATAGRAMS out of udp_queue_rcv_skb() to udp_recvmsg(); this however entails (i) revise all clients that use data_ready handler (sunrpc) (ii) decide what to do with udp_poll()