Most recent kernel where this bug did not occur: N/A Distribution: Debian (but stock vanilla kernel.org kernel tree) Hardware Environment: Intel Core 2 Duo E6300 on Gigabyte GA-945GM-S2 mainboard, NIC under test running 100Mbps, full duplex Software Environment: Debian testing installation (last updated early Nov 2006) Problem Description: About 2/3 of received packet data corrupted when using the r8169 driver to operate the onboard Realtek 8111B gigabit ethernet chip Steps to reproduce: modprobe r8169, configure appropriate for operation on your ethernet ping the NIC from another machine Notice that over 60% of the pings don't return An examination of the traffic with wireshark running on the test machine reveals that the machine is not returning most of the pings because the received packet data is being corrupted. The nature of the corruption is that the first 4 bytes of the corrupted packet data is simply missing (and there's 4 bytes of garbage appended at the end so the overall packet size stays the same). Smells like a fencepost error somewhere... I have not tried this with older kernels, but then before 2.6.19, the r8169 driver apparently did not support this chip.
Created attachment 9493 [details] Tarball containing lspci output from machine under test and saved wireshark capture output showing packet corruption The attached tarball contains the following: 1. lspci output of machine under test, showing specific data about the Realtek 8111B NIC where the problem is happening. 2. A saved packet capture file ("bad-pings"). Load up the packet capture in ethereal / wireshark and look at the packet contents. The machine under test is where this data was captured. Its IP address was 192.168.27.10. The machine sending the ping requests was 192.168.27.1. The packets labelled "LLC" by wireshark are examples of corrupted data. You can actually count those up and notice that they correspond to the missing ping requests (the gap in the ICMP sequence numbers matches the intervening count of these packets).
Created attachment 9509 [details] sync with Realtek's init sequence
Created attachment 9510 [details] debug helper Please try the previous patch. I am not very confident that it will change things but it deserves to be tested. If it does not work better, please apply the debug patch above on top of it, start a ping test and capture the traffic. If you avoid unrelated traffic on the link, it should not spam your logs too much. The output of the test (capture file + kernel log + ping transmission rate) would be welcome for something like hundred or two hundred ICMP packets. -- Ueimor
Created attachment 9514 [details] Test results with patches applied I applied the initialization patch (which as expected didn't help) and the patch that added some debug code. Results are in the attachment. There's a readme file within the tarball that describes its contents.
Created attachment 9550 [details] align more carefully - 69% packets loss. - 1/3 of skb->data aligned on a 16 bytes boundary 2/3 of (skb->data - 4) aligned on a 16 bytes boundary No need to shout, it seems crystal clear. Can you add the patch above on top of the previous serie and reproduce the last test ? Your .config would be welcome too. -- Ueimor
Created attachment 9567 [details] align even more carefully Oops -- Ueimor
Created attachment 9568 [details] Test results with id=9567 patch applied All requested data is in the tar-bzip file
It seems to perform better. Can you apply path #9567 on top of last 2.6.19-rc6 (without the debug stuff) and check that nothing bad happens ? -- Ueimor
Reply-To: isely@isely.net Patch applied, new kernel (2.6.19-rc6) built, and it works. Any chance this can get into 2.6.19?... -Mike
bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> : [...] > Any chance this can get into 2.6.19?... A patch that I had submitted two weeks ago has been postponed to 2.6.20. It was related to the link management. I can guess the answer if a patch which messes with the alignment in the skb is submitted.
Patch has been committed in Linus's trunk.. For details, see: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cc9f022d97d08e4e36d38661857991fe91447d68 -- Ueimor