Bug 13404
Summary: | with atl1e: Corrupted MAC on input | ||
---|---|---|---|
Product: | Drivers | Reporter: | Gene Czarcinski (gczarcinski) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | RESOLVED OBSOLETE | ||
Severity: | high | CC: | account, akpm, alan, cebbert, csnook, dc2, jcliburn, nyxkn, r_herrma, will.bryant |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.29.4-167.fc11.x86_64 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
lspci -v output showing both old and new NIC interfaces
cpuinfo meminfo lspci -v output |
Description
Gene Czarcinski
2009-05-30 20:10:31 UTC
Created attachment 21634 [details]
lspci -v output showing both old and new NIC interfaces
Cc maintainers. It sure looks like a dup to me but I will leave this open for someone a lot more expert than me to judge. Note: all "tests" that I ran consisted in attempting to copy approximately 600GB from one system to another. The link between the two systems is a Netgear gigabit switch with short (~ 3m) cables. The "to" system was always the one with the atl1e driver. Just to capture the maintainer's response in the bug report... On Fri, 5 Jun 2009 12:44:19 +0800 Jie Yang <Jie.Yang@Atheros.com> wrote: > On Friday, June 05, 2009 7:03 AM > Jay Cliburn <jcliburn@gmail.com> wrote: [...] > > Jie, > > > > Could you please look into these reports of corruption? > > > > sure, I will try to reproduce this bug first. > > Best wishes > jie Reply-To: Jie.Yang@Atheros.com On Friday, June 05, 2009 8:50 PM Jay Cliburn <jcliburn@gmail.com> wrote: > Just to capture the maintainer's response in the bug report... > > On Fri, 5 Jun 2009 12:44:19 +0800 > Jie Yang <Jie.Yang@Atheros.com> wrote: > > > On Friday, June 05, 2009 7:03 AM > > Jay Cliburn <jcliburn@gmail.com> wrote: > [...] > > > Jie, > > > > > > Could you please look into these reports of corruption? > > > > > > > sure, I will try to reproduce this bug first. > > > > Best wishes > > jie > Oh, I failed to reproduce this bug on my platform. Mainboard: ASUS M3A79-T Deluxe CPU: AMD Phenom(tm) 9950 Quad-Core Processor Mem: 6G software paltform: 2.6.29.1-102.fc11.x86_64 I use scp to copy about 4GB, it successd. [root@localhost ~]# scp /tmp/Fedora-11-Preview-x86_64-DVD.iso root@192.168.0.1:/dev/null Address 192.168.0.1 maps to leo-pc.users.atheros.com, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! root@192.168.0.1's password: Fedora-11-Preview-x86_64-DVD.iso 100% 4397MB 36.3MB/s 02:01 [root@localhost ~]# ifconfig eth7 eth7 Link encap:Ethernet HWaddr 00:13:74:12:14:01 inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::213:74ff:fe12:1401/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1692161 errors:0 dropped:0 overruns:0 frame:0 TX packets:14447145 errors:0 dropped:0 overruns:0 carrier:1 collisions:0 txqueuelen:1000 RX bytes:123255824 (117.5 MiB) TX bytes:20737024713 (19.3 GiB) Interrupt:30 Attach is the detail info about "pcis, cpuinfo, meminfo" Can you give me some advise to reproduce this bug. Created attachment 21826 [details]
cpuinfo
Created attachment 21827 [details]
meminfo
OK, lscpi -v output, /proc/cpuinfo and /proc/meminfo now attached.
My test involved copying a lot of data with scp: 182GB (not just 4GB).
Sometimes the error would occur almost immediately but other times it took a while.
I suspect (I have not done this) that a test could be constructed using netcat (nc) to repeatedly transfer a file with a known checksum and then test that file to see if it was different.
Anyway, I suspect [I have not looked/tested] that other data handled by scp is being corrupted besides that which produces the "Corrupted MAC on Input" error.
Do you need help setting up such a netcat test? After you transferred the 4GB file, did you run a checksum to see if it was exactly the same as the original file?
Created attachment 22168 [details]
lspci -v output
I am experiencing this same problem on my hardware:
ASUS P5QL Pro
Intel Core 2 Duo E6600
4GB RAM
Fedora 11, running 2.6.29.5-191.fc11.x86_64
I've managed to reproduce this error by logging to the new computer by SSH, and by transferring files to the computer by SCP.
This is definitely a problem with the atl1e driver.
[Adding Atheros maintainer to cc list.] On Wed, 1 Jul 2009 17:01:08 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=13404 > > > > > > --- Comment #9 from Ville Törhönen <ville@torhonen.fi> 2009-07-01 17:01:05 > --- > Created an attachment (id=22168) > --> (http://bugzilla.kernel.org/attachment.cgi?id=22168) > lspci -v output > > I am experiencing this same problem on my hardware: > > ASUS P5QL Pro > Intel Core 2 Duo E6600 > 4GB RAM > Fedora 11, running 2.6.29.5-191.fc11.x86_64 > > I've managed to reproduce this error by logging to the new computer by SSH, > and > by transferring files to the computer by SCP. > > This is definitely a problem with the atl1e driver. > Same here on an Aspire 6530G laptop (AMD Turion X2). I tried to switch off TSO and boot with maxcpus=1. This had no effect. The problem disappears when the machine is under heavy load. Running a loop like 'while true; do false; done' while the network activity takes place, makes the problem disappear. Had the same problem with the same hardware as the original poster (Asus M4A78 PRO, AMD Phenom II 940). After changing network cables and switches, I finally tried a bios update and that fixed it. So doesn't seem to be a driver problem, at least for this hardware setup. My controller is "Atheros Communications AR8121/AR8113/AR8114 Gigabit or Fast Ethernet (rev b0)", as per lspci, and I've been having the same issue. However, it appears to have been fixed (at least for me) with the latest Atheros AR81 driver. The package name is "AR81Family-linux-v1.0.1.14.tar.gz", and it's available from http://partner.atheros.com/Drivers.aspx. modinfo of this new atl1e module now displays "1.0.1.14" in the version field, while the one that came with my kernel 2.6.37.4 had "1.0.0.7-NAPI". The kernel driver module maybe needs to be updated? (This could probably also fix the issues from bug 12282 and maybe bug 27712) If this is still seen on modern kernels then please re-open/update |