There appears to be a serious problem with the "atl1e" driver supporting the Attansic Technology Corp. Atheros AR8121/AR8113/AR8114 PCI-E Ethernet Controller For me, the problem occurred when I was copying hundreds of gigabytes of ISO image files from one system to another using ssh's "scp" command/program. At random points during copying I would get the error "Corrupted MAC on input" which then terminated the scp command. This "test" was run multiple (about 6) times and each time it failed at some (random) point. The software: Fedora 11 preview with "latest" updates and the 2.6.29.4-167.fc11.x86_64 kernel. The hardware: ASUS M4A78 PRO motherboard, AMD Phenom II 940 processor (3 GHz, four CPUs), 8 GB system memory. The Atheros Ethernet Controller integrated on the mobo. (I will be attaching the output of "lspci". Why do I believe it is the driver -- 1. I installed Fedora 10 running the 2.6.27.24-170.2.68.fc10.x86_64 kernel. I again ran a half dozen tests with NO failures. 2. I installed Fedora 11 preview with updates on another system (4400 dual processor) with a Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) NIC. I then ran 4 tests copys with NO failures. 3. Finally, I installed a new PCI Express NIC on the Phenom system -- D-Link System Inc DGE-560T PCI Express Gigabit Ethernet Adapter (rev 13). I then ran 8 copy tests with NO failures. Conclusion: major problem in the atl1e driver Although I did not test this and thus have no proof, I suspect copying large amounts of data with something like ftp to this system via atl1e would also result in corrupted data but the only way to detect it would be by checksumming the files.
Created attachment 21634 [details] lspci -v output showing both old and new NIC interfaces
Cc maintainers.
Duplicate of bug 12282 ??
It sure looks like a dup to me but I will leave this open for someone a lot more expert than me to judge. Note: all "tests" that I ran consisted in attempting to copy approximately 600GB from one system to another. The link between the two systems is a Netgear gigabit switch with short (~ 3m) cables. The "to" system was always the one with the atl1e driver.
Just to capture the maintainer's response in the bug report... On Fri, 5 Jun 2009 12:44:19 +0800 Jie Yang <Jie.Yang@Atheros.com> wrote: > On Friday, June 05, 2009 7:03 AM > Jay Cliburn <jcliburn@gmail.com> wrote: [...] > > Jie, > > > > Could you please look into these reports of corruption? > > > > sure, I will try to reproduce this bug first. > > Best wishes > jie
Reply-To: Jie.Yang@Atheros.com On Friday, June 05, 2009 8:50 PM Jay Cliburn <jcliburn@gmail.com> wrote: > Just to capture the maintainer's response in the bug report... > > On Fri, 5 Jun 2009 12:44:19 +0800 > Jie Yang <Jie.Yang@Atheros.com> wrote: > > > On Friday, June 05, 2009 7:03 AM > > Jay Cliburn <jcliburn@gmail.com> wrote: > [...] > > > Jie, > > > > > > Could you please look into these reports of corruption? > > > > > > > sure, I will try to reproduce this bug first. > > > > Best wishes > > jie > Oh, I failed to reproduce this bug on my platform. Mainboard: ASUS M3A79-T Deluxe CPU: AMD Phenom(tm) 9950 Quad-Core Processor Mem: 6G software paltform: 2.6.29.1-102.fc11.x86_64 I use scp to copy about 4GB, it successd. [root@localhost ~]# scp /tmp/Fedora-11-Preview-x86_64-DVD.iso root@192.168.0.1:/dev/null Address 192.168.0.1 maps to leo-pc.users.atheros.com, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! root@192.168.0.1's password: Fedora-11-Preview-x86_64-DVD.iso 100% 4397MB 36.3MB/s 02:01 [root@localhost ~]# ifconfig eth7 eth7 Link encap:Ethernet HWaddr 00:13:74:12:14:01 inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::213:74ff:fe12:1401/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1692161 errors:0 dropped:0 overruns:0 frame:0 TX packets:14447145 errors:0 dropped:0 overruns:0 carrier:1 collisions:0 txqueuelen:1000 RX bytes:123255824 (117.5 MiB) TX bytes:20737024713 (19.3 GiB) Interrupt:30 Attach is the detail info about "pcis, cpuinfo, meminfo" Can you give me some advise to reproduce this bug.
Created attachment 21826 [details] cpuinfo
Created attachment 21827 [details] meminfo OK, lscpi -v output, /proc/cpuinfo and /proc/meminfo now attached. My test involved copying a lot of data with scp: 182GB (not just 4GB). Sometimes the error would occur almost immediately but other times it took a while. I suspect (I have not done this) that a test could be constructed using netcat (nc) to repeatedly transfer a file with a known checksum and then test that file to see if it was different. Anyway, I suspect [I have not looked/tested] that other data handled by scp is being corrupted besides that which produces the "Corrupted MAC on Input" error. Do you need help setting up such a netcat test? After you transferred the 4GB file, did you run a checksum to see if it was exactly the same as the original file?
Created attachment 22168 [details] lspci -v output I am experiencing this same problem on my hardware: ASUS P5QL Pro Intel Core 2 Duo E6600 4GB RAM Fedora 11, running 2.6.29.5-191.fc11.x86_64 I've managed to reproduce this error by logging to the new computer by SSH, and by transferring files to the computer by SCP. This is definitely a problem with the atl1e driver.
[Adding Atheros maintainer to cc list.] On Wed, 1 Jul 2009 17:01:08 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=13404 > > > > > > --- Comment #9 from Ville Törhönen <ville@torhonen.fi> 2009-07-01 17:01:05 > --- > Created an attachment (id=22168) > --> (http://bugzilla.kernel.org/attachment.cgi?id=22168) > lspci -v output > > I am experiencing this same problem on my hardware: > > ASUS P5QL Pro > Intel Core 2 Duo E6600 > 4GB RAM > Fedora 11, running 2.6.29.5-191.fc11.x86_64 > > I've managed to reproduce this error by logging to the new computer by SSH, > and > by transferring files to the computer by SCP. > > This is definitely a problem with the atl1e driver. >
Same here on an Aspire 6530G laptop (AMD Turion X2). I tried to switch off TSO and boot with maxcpus=1. This had no effect. The problem disappears when the machine is under heavy load. Running a loop like 'while true; do false; done' while the network activity takes place, makes the problem disappear.
Had the same problem with the same hardware as the original poster (Asus M4A78 PRO, AMD Phenom II 940). After changing network cables and switches, I finally tried a bios update and that fixed it. So doesn't seem to be a driver problem, at least for this hardware setup.
My controller is "Atheros Communications AR8121/AR8113/AR8114 Gigabit or Fast Ethernet (rev b0)", as per lspci, and I've been having the same issue. However, it appears to have been fixed (at least for me) with the latest Atheros AR81 driver. The package name is "AR81Family-linux-v1.0.1.14.tar.gz", and it's available from http://partner.atheros.com/Drivers.aspx. modinfo of this new atl1e module now displays "1.0.1.14" in the version field, while the one that came with my kernel 2.6.37.4 had "1.0.0.7-NAPI". The kernel driver module maybe needs to be updated? (This could probably also fix the issues from bug 12282 and maybe bug 27712)
If this is still seen on modern kernels then please re-open/update