Most recent kernel where this bug did *NOT* occur: 2.6.18 Distribution: Hardware Environment: ThinkPad T41 # lspci | grep Ether 02:01.0 Ethernet controller: Intel Corporation 82540EP Gigabit Ethernet Controller (Mobile) (rev 03) 02:02.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01) Software Environment: Gentoo Linux (stable), e1000 driver for the Gigebit network card Problem Description: all UDP packets created by my localhost have a wrong check sum whereas the received UDP packets are ok Steps to reproduce: initial bug report : http://bugs.gentoo.org/show_bug.cgi?id=164694
I realized this initially while playing with wireshark. I can reproduce it with some kernels between v2.6.18 and v.2.6.19 at the command line with tcpdump: $>tcpdump udp -A -vv -i eth0 -l -q -p | grep UDP | grep 'bad udp cksum' Unfortunately the behaviour can only be reproduced within a LAN (100 MBit Ethernet) and not at home (4 MBit DSL) - and it's not possible to reproduce the issue with a user mode linux :-( My first attempt to bisect the first bad commit was ... unsuccesful. I started with bisect v2.6.18 ... v2.6.19 but the resulted commit gave me a kernel which panic'ed instead to boot. I made some attemps like "$>git reset --hard HEAD~30" but I didn't got a bootable kernel. The next attempt was : "$> git bisect start drivers/net/" but after some steps I made a mistake (typed "$> git bisect good" instead of "$> git bisect bad". I edited the .git/BISECT_LOG file (cutted the last 2 lines) and made a "$> git bisect replay .git/BISECT_LOG") but the result was that the bisected tree was reseted and the file .git/BISECT_LOG was lost :-(.
maybe I'm completely off here... what if the hardware sets the checksum for every packet? that means that tcpdump can never see the correct checksum before it goes out. AFAIK all e1000 hw sets the checksum in hardware, so this is kind of expected. if at all this is a bug, it is that tcpdump doesn't know about hw csum offload capabilities of the NIC it's tracing. This "issue" should show up for any NIC that does HW csum offload on transmit.
Created attachment 10333 [details] config I tracked down the problem now to the module iptable_nat. I reproduced the issue with kernel 2.6.18 (kernel config attached). After booting into that kernel the command $>tcpdump udp -A -vv -i eth0 -l -q -p | grep UDP | grep 'bad udp cksum' gives a log of output, if ntpd was started and some DNS queries are taken. If i made then a $>modprobe iptable_nat I got no subsequent bad UDP packets.
Yes, of course that makes the problem go away. DUH netfilter is too stupid ;) to know that the outgoing interface will do the checksum offloading for us, and therefore always calculates it when translating addresses, because it *has* to - it just modified the package! You just found a case where we can optimize iptable_nat to _NOT_ recalculate the csum for us and let the hardware do it.
Ok, with kernel 2.6.20 iptables_nat doesn't seem to calculate checksum - now all sniffed UDP packets have a wrong check sum - as expected, yes ? If that's the case I'll close this bug. (BTW, the TCP check sum are not affected, isn't it ?)
The TCP checksum should also be missing, but tcpdump might notice that and not complain about it. In any case: if your receiving end sees the right checksums then everything is working the way it should. Please close this issue :)