Bug 12282 - Network data corruption on eee 1000
Summary: Network data corruption on eee 1000
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-23 21:24 UTC by walken
Modified: 2013-12-10 16:45 UTC (History)
13 users (show)

See Also:
Kernel Version: 2.6.39
Tree: Mainline
Regression: No


Attachments
workaround from comment #18 (583 bytes, patch)
2011-08-29 22:24 UTC, Jonathan Nieder
Details | Diff

Description walken 2008-12-23 21:24:44 UTC
Latest working kernel version: unknown
Earliest failing kernel version: 2.6.28-rc8
Distribution: debian lenny
Hardware Environment: eee 1000, no hardware changes except for a 2GB memory upgrade.
Software Environment:
Problem Description: Intermittent data corruption over wired network


Running debian lenny on my eee 1000, I've seen occasional scp failures where
scp would complain about a corrupted MAC when copying files around on my local network. Also when compiling things over NFS I occasionally got my source files to appear corrupted on the client (while they were still fine on the server) and when I tried running things in an nfsroot environment (I know this sounds silly for a laptop, but I see it as a good way to try new software without having to install it on disk), I got occasional segfaults in various processes. Since I've not seen such failures when running with a disk based root, I blame them all on the networking subsystem.


I've been running the following command as a way to try and reproduce the problem:

for x in 0 1 2 3 4 5 6 7 8 9; do for y in 0 1 2 3 4 5 6 7 8 9; do for z in 0 1 2 3 4 5 6 7 8 9; do echo $x$y$z; scp server:shared/net_test/data1GB /tmp || sleep 36000; date; done; done; done
000
data1GB                                       100% 1005MB   5.2MB/s   03:15    
Tue Dec 23 20:17:36 PST 2008
001
data1GB                                       100% 1005MB   5.2MB/s   03:12    
Tue Dec 23 20:20:49 PST 2008
002
data1GB                                       100% 1005MB   5.2MB/s   03:13    
Tue Dec 23 20:24:03 PST 2008
003
data1GB                                       100% 1005MB   6.4MB/s   02:38    
Tue Dec 23 20:26:42 PST 2008
004
data1GB                                        98%  994MB   5.4MB/s   00:02 ETADisconnecting: Corrupted MAC on input.
lost connection

The failures don't always happen at the same place, and they might be slightly more likely soon after boot, but I'm not sure about that.

Even after scp detected some data corruption, ifconfig does not report any errors:

eth0      Link encap:Ethernet  HWaddr 00:22:15:85:7c:94  
          inet addr:10.3.0.1  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3683950 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1432256 errors:0 dropped:0 overruns:0 carrier:2
          collisions:0 txqueuelen:1000 
          RX bytes:1246310892 (1.1 GiB)  TX bytes:101092933 (96.4 MiB)
          Interrupt:59 

(Note the RX bytes value is also wrong since I transferred almost 5GB above,
I believe this is because the value wraps around after 4GB ? Also, /proc/interrupts reports >3 million interrupts (PCI-MSI-edge) on eth0)

I'm tempted to blame either the hardware or the newish atl1e network driver, but have no hard proof either way at this point.
Comment 1 walken 2008-12-23 21:32:40 UTC
I should add a few things:

* I have a few other clients accessing the same server and they don't encounter any data corruption;
* The server is on my own LAN. There is only a gigabit switch between the server and the eee 1000 client.
* I am a bit surprised that any corruption possibly happening in the wires or in the switch would not be caught by either the ethernet CRC or the TCP checksum ???
Comment 2 Anonymous Emailer 2008-12-23 23:20:46 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 23 Dec 2008 21:24:45 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12282
> 
>            Summary: Network data corruption on eee 1000
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.28-rc8
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: jgarzik@pobox.com
>         ReportedBy: walken@zoy.org
> 
> 
> Latest working kernel version: unknown
> Earliest failing kernel version: 2.6.28-rc8
> Distribution: debian lenny
> Hardware Environment: eee 1000, no hardware changes except for a 2GB memory
> upgrade.
> Software Environment:
> Problem Description: Intermittent data corruption over wired network
> 
> 
> Running debian lenny on my eee 1000, I've seen occasional scp failures where
> scp would complain about a corrupted MAC when copying files around on my
> local
> network. Also when compiling things over NFS I occasionally got my source
> files
> to appear corrupted on the client (while they were still fine on the server)
> and when I tried running things in an nfsroot environment (I know this sounds
> silly for a laptop, but I see it as a good way to try new software without
> having to install it on disk), I got occasional segfaults in various
> processes.
> Since I've not seen such failures when running with a disk based root, I
> blame
> them all on the networking subsystem.
> 
> 
> I've been running the following command as a way to try and reproduce the
> problem:
> 
> for x in 0 1 2 3 4 5 6 7 8 9; do for y in 0 1 2 3 4 5 6 7 8 9; do for z in 0
> 1
> 2 3 4 5 6 7 8 9; do echo $x$y$z; scp server:shared/net_test/data1GB /tmp ||
> sleep 36000; date; done; done; done
> 000
> data1GB                                       100% 1005MB   5.2MB/s   03:15   
> Tue Dec 23 20:17:36 PST 2008
> 001
> data1GB                                       100% 1005MB   5.2MB/s   03:12   
> Tue Dec 23 20:20:49 PST 2008
> 002
> data1GB                                       100% 1005MB   5.2MB/s   03:13   
> Tue Dec 23 20:24:03 PST 2008
> 003
> data1GB                                       100% 1005MB   6.4MB/s   02:38   
> Tue Dec 23 20:26:42 PST 2008
> 004
> data1GB                                        98%  994MB   5.4MB/s   00:02
> ETADisconnecting: Corrupted MAC on input.
> lost connection
> 
> The failures don't always happen at the same place, and they might be
> slightly
> more likely soon after boot, but I'm not sure about that.
> 
> Even after scp detected some data corruption, ifconfig does not report any
> errors:
> 
> eth0      Link encap:Ethernet  HWaddr 00:22:15:85:7c:94  
>           inet addr:10.3.0.1  Bcast:10.255.255.255  Mask:255.0.0.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:3683950 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1432256 errors:0 dropped:0 overruns:0 carrier:2
>           collisions:0 txqueuelen:1000 
>           RX bytes:1246310892 (1.1 GiB)  TX bytes:101092933 (96.4 MiB)
>           Interrupt:59 
> 
> (Note the RX bytes value is also wrong since I transferred almost 5GB above,
> I believe this is because the value wraps around after 4GB ? Also,
> /proc/interrupts reports >3 million interrupts (PCI-MSI-edge) on eth0)
> 
> I'm tempted to blame either the hardware or the newish atl1e network driver,
> but have no hard proof either way at this point.
> 
Comment 3 walken 2008-12-24 04:18:01 UTC
At this point I wonder if this could be an issue with marginal memory
timings, but which somehow only gets triggered when transfering with the
network adapter, and never when being accessed by the CPU. But is that
even possible ???

Here are few additional data points I collected:

In order to see what the raw data looks like before scp complains about the
corrupted MAC, I decided to drop scp and use nfs + cp + md5sum:

cp /mnt/shared/net_test/data1GB /tmp; md5sum /tmp/data1GB
(/mnt/shared is an nfs3 over tcp mount, and /tmp is a tmpfs).

After a few tries I usually get the wrong md5sum in /tmp/data1GB,
I then copy the file back to the server, check that it arrived there
with the same corrupted md5sum as it had on the eee client side,
and use "cmp -l" to figure out what's different between the original
and the corrupted file.

Turns out that in all cases I've observed, the corrupted file had a
128-byte region with unexpected (garbage) contents. Not just single bits
being flipped, but the whole region being entirely different. The regions
were not necessarily aligned on a 128 byte boundary relative to the start
of the file, though.

At this point I wondered "bad memory?" and I swapped back the original 1GB
stick that came with the EEE 1000, instead of the 2GB upgrade I had installed
on the first day. Turns out that only made things worse ! with that stick,
I still see some 128-byte regions getting corrupted, and I additionally
see a few bytes here and there (always at an offset multiple of 4 relative
to the start of the file) having bit 0x02 set when they should not.
If I run md5sum on the /tmp file multiple times I will always get the
same hash, but it did take me 3 trials (with a 500MB file, my /tmp is
smaller now that I have only 1GB of memeory) before I did end up with a
copy on the server that had the same hash as the corrupted /tmp file.
The two other copies had a few more 0x02 bits mistakenly set here and there.

Both memory sticks do check out fine with "memtester" (I have not tried
memtest86 yet), and that I don't observe any trouble when not using the LAN.

Could this be a timing issue that would only show up when transfering
between memory and the network adapter ? And if so, what can we even do
about it ? I'm using bios version 0803 which is the most recent available
for the EEE 1000.

I won't be able to do much testing in the following week as I'll be away
from my LAN :) , I should be able to get wireless and read my email though.

On Tue, Dec 23, 2008 at 11:20:35PM -0800, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue, 23 Dec 2008 21:24:45 -0800 (PST) bugme-daemon@bugzilla.kernel.org
> wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=12282
> > 
> >            Summary: Network data corruption on eee 1000
> >            Product: Drivers
> >            Version: 2.5
> >      KernelVersion: 2.6.28-rc8
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Network
> >         AssignedTo: jgarzik@pobox.com
> >         ReportedBy: walken@zoy.org
> > 
> > 
> > Latest working kernel version: unknown
> > Earliest failing kernel version: 2.6.28-rc8
> > Distribution: debian lenny
> > Hardware Environment: eee 1000, no hardware changes except for a 2GB memory
> > upgrade.
> > Software Environment:
> > Problem Description: Intermittent data corruption over wired network
> > 
> > 
> > Running debian lenny on my eee 1000, I've seen occasional scp failures
> where
> > scp would complain about a corrupted MAC when copying files around on my
> local
> > network. Also when compiling things over NFS I occasionally got my source
> files
> > to appear corrupted on the client (while they were still fine on the
> server)
> > and when I tried running things in an nfsroot environment (I know this
> sounds
> > silly for a laptop, but I see it as a good way to try new software without
> > having to install it on disk), I got occasional segfaults in various
> processes.
> > Since I've not seen such failures when running with a disk based root, I
> blame
> > them all on the networking subsystem.
> > 
> > 
> > I've been running the following command as a way to try and reproduce the
> > problem:
> > 
> > for x in 0 1 2 3 4 5 6 7 8 9; do for y in 0 1 2 3 4 5 6 7 8 9; do for z in
> 0 1
> > 2 3 4 5 6 7 8 9; do echo $x$y$z; scp server:shared/net_test/data1GB /tmp ||
> > sleep 36000; date; done; done; done
> > 000
> > data1GB                                       100% 1005MB   5.2MB/s   03:15 
> > Tue Dec 23 20:17:36 PST 2008
> > 001
> > data1GB                                       100% 1005MB   5.2MB/s   03:12 
> > Tue Dec 23 20:20:49 PST 2008
> > 002
> > data1GB                                       100% 1005MB   5.2MB/s   03:13 
> > Tue Dec 23 20:24:03 PST 2008
> > 003
> > data1GB                                       100% 1005MB   6.4MB/s   02:38 
> > Tue Dec 23 20:26:42 PST 2008
> > 004
> > data1GB                                        98%  994MB   5.4MB/s   00:02
> > ETADisconnecting: Corrupted MAC on input.
> > lost connection
> > 
> > The failures don't always happen at the same place, and they might be
> slightly
> > more likely soon after boot, but I'm not sure about that.
> > 
> > Even after scp detected some data corruption, ifconfig does not report any
> > errors:
> > 
> > eth0      Link encap:Ethernet  HWaddr 00:22:15:85:7c:94  
> >           inet addr:10.3.0.1  Bcast:10.255.255.255  Mask:255.0.0.0
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:3683950 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:1432256 errors:0 dropped:0 overruns:0 carrier:2
> >           collisions:0 txqueuelen:1000 
> >           RX bytes:1246310892 (1.1 GiB)  TX bytes:101092933 (96.4 MiB)
> >           Interrupt:59 
> > 
> > (Note the RX bytes value is also wrong since I transferred almost 5GB
> above,
> > I believe this is because the value wraps around after 4GB ? Also,
> > /proc/interrupts reports >3 million interrupts (PCI-MSI-edge) on eth0)
> > 
> > I'm tempted to blame either the hardware or the newish atl1e network
> driver,
> > but have no hard proof either way at this point.
> > 
> 
Comment 4 Jay Cliburn 2008-12-24 05:32:41 UTC
On Wed, Dec 24, 2008 at 1:20 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Tue, 23 Dec 2008 21:24:45 -0800 (PST) bugme-daemon@bugzilla.kernel.org
> wrote:
>
>> http://bugzilla.kernel.org/show_bug.cgi?id=12282
>>
>>            Summary: Network data corruption on eee 1000

Do things improve if you turn off TSO in the atl1e driver?

ethtool -K eth0 tso off
Comment 5 Herbert Xu 2008-12-24 15:03:11 UTC
If turning off TSO does fix it, please also test with an MTU > 4096 to see whether the problem is in TSO or whether it's in page handling.
Comment 6 walken 2008-12-24 20:19:46 UTC
On Wed, Dec 24, 2008 at 07:32:36AM -0600, J. K. Cliburn wrote:
> Do things improve if you turn off TSO in the atl1e driver?
> 
> ethtool -K eth0 tso off

I'm currently away in vacation but I should be able to test this next week.

I will even try with both memory sticks, as the 1GB one seemed to give
more issues when using the wired network (but still worked fine when
just using the cpu).

Merry Christmas everyone ! :)
Comment 7 walken 2008-12-30 18:20:32 UTC
On Wed, Dec 24, 2008 at 07:32:36AM -0600, J. K. Cliburn wrote:
> Do things improve if you turn off TSO in the atl1e driver?
> 
> ethtool -K eth0 tso off

Seems to work:

I get "operation not supported" when trying this ethtool command.
However, I compiled 2.6.28 with a patch to not set NETIF_F_TSO and
NETIF_F_TSO6 into netdev->features and I've been able to transfer
180 GB with scp overnight without running into any corrupted MACs.

One thing I do not understand - I thought the tso option was only
meaningful on the sender side ??? In my case, the transfers are going
from the external server to the local atl1e based interface...
Comment 8 walken 2008-12-31 01:34:34 UTC
On Wed, Dec 24, 2008 at 07:32:36AM -0600, J. K. Cliburn wrote:
> Do things improve if you turn off TSO in the atl1e driver?
> 
> ethtool -K eth0 tso off

I said yes in a previous message, but I think I was confused.

I'm now running 2.6.28 with a driver change that removes NETIF_F_TSO and
NETIF_F_TSO6 from netdev->features. Last night I transferred 180 GB
with scp without any corrupted MACs, however today while compiling stuff
over NFS I got a 128-byte block of source code that was corrupted
(replaced with text from a different source file). So, my issue does not
seem to disappear even after disabling TSO support in the driver.
Also the issue seems to be somewhat intermittent, since I was not able
to triger it at will with scp last night.... :/
Comment 9 Alan 2009-03-19 10:43:54 UTC
Try memtest86 although if its dependant on both network and CPU load that may not turn up a slightly marginal system
Comment 10 orion 2009-05-05 23:18:20 UTC
I've also encountered this identical problem on my Asus EEE 1000HE. It uses the same network module for functionality as the original poster. 

I've done extensive testing on the issue and I believe the following is relevant:-

* It seems to happen much more often after the boot sequence has completed. In fact, I can get the first MAC corruption message quite soon after bootup, but it's much more difficult to get afterwards.

* Like the original poster, ifconfig does not show any errors on the ethernet device after it has occurred.

* Lowering the transfer speed seems to delay the error (in terms of bytes), but does not prevent it. It always seems to eventually occur.

* I have confirmed by testing that there is nothing wrong with the cabling. In any case, TCP/IP should pick up any issues on the wire and request a re-transmission to the best of my knowledge. I'm surprised it doesn't for this problem in fact.

* I have experienced the problem under both Debian unstable and Ubuntu 8.10 in the same manner.

* The EEE has undergone 10 separate passes under memtest86+ without flagging a single memory error. I also tried to use SCP with both the wireless and loopback interfaces without triggering a problem. I would think this would rule out a general memory error as I cannot see any standard memory problem would affect the ethernet only.

* Switching BIOS's seem to have no effect. 

* I tried using Cygwin under Windows and could not trigger either this error, or any other that I could clearly link to this. Interestingly, network performance using Windows is seemingly significantly slower than Linux on this machine.

* I tried loading the machine during network transfers to see if that changed the error pattern. That seemed to make no difference.

Thanks
Comment 11 Bill McGonigle 2009-06-03 21:14:46 UTC
I've been seeing this on a couple ASUS boards (p5q pro and 1000he) under Fedora 11.  tso off didn't help.  'rx off tx off' isn't supported by this driver.  The machines have been memtest86+'ed.  I noted a pattern in the traffic on one transfer, which I haven't been able to reproduce consistently (I guess I don't understand the proper state characteristics yet).

That pattern is noted in this comment:
  https://bugzilla.redhat.com/show_bug.cgi?id=503288#c4

I think I also see corruption on different hardware, I need to reproduce that again to convince myself.  I've taken to testing like:

dd if=/dev/urandom of=random.dd bs=1G count=3

and then copying that file via sneakernet and also netcat:

 netcat -l -p 1234 > random.net
 netcat f10-host 1234 < random.dd

and then running cmp and/or vbindiff on the pair of files.  Please critique the methodology if it's insufficient.
Comment 12 Bill McGonigle 2009-06-08 23:06:01 UTC
The pattern I mentioned in comment #11 is a red herring - it was due to a bad flash drive.  New methodology adds hashing the sneakernet'ed file on both ends of the transfer...
Comment 13 Bill McGonigle 2009-06-10 19:05:07 UTC
[aside: If somebody can make a determination of bug 13404's status as a dup of this one we could consolidate effort.]

I've been adding some info to the Redhat bug but seem to have neglected doing so here.  Here's what I know at this point:

I can reproduce this 100% on an ASUS eeePC 1000HE and an ASUS P5Q Pro.  The first is a single-core Atom and the latter is a Core2Quad.  Both use the atl1e driver.  Both pass > 10 cycles of memtest86+.  I've updated BIOS on the P5Q.

I'm currently generating a 3GB file from /dev/urandom, sneakernetting it between machines, using sha256sum to verify the sneakernet, and then using netcat over TCP to transfer the random file between the machines.  I then compare the transfers with sha256sum and use 'cmp -l' to list the errors when they occur.  I've only caught the problem when the atl1e machine is the receiving end of the netcat.  I thought I saw the problem on other network hardware but I've ruled those (a netgear PCI card in the P5Q and the eeePC's RT2860 wireless) out with this methodology.

The error _rate_ is fairly low.  In transferring a 3GB file of random numbers on the eeePC I saw 128 single-byte errors; on the P5Q I saw 281 errors.  The raw cmp's are here, if it helps anybody:
eeePC: https://bugzilla.redhat.com/attachment.cgi?id=347106
P5Q: https://bugzilla.redhat.com/attachment.cgi?id=347107

The machines with problems are running Fedora 11, sporting kernel: 2.6.29.4-162.fc11.i586.  I can run Fedora 10 on the P5Q machine, and have not been able to reproduce the problems there yet.  It has kernel: 2.6.27.21-170.2.56.fc10.i686.PAE .

Now, walken mentioned memory transfers in comment #3.  I wanted to note this again because when I was trying to figure this out (as shown in my confused posts in comment #11 and comment #12), I assumed my flash drive was a reliable way to get the known-good file between machines.  It turns out that the flash drives are fine, but I experience data corruption writing to any flash drives (reading seems OK) on the same kernels that give me network corruption on the P5Q machine.  They seem to work fine on the older kernel.  It was this problem that showed the pattern mentioned in comment #11 - I've reproduced that again.  I also experienced trouble reading from a SATA DVD drive, though SATA disks on the same controller work perfectly.  My testing methodology on the USB drives is noted here: https://bugzilla.redhat.com/show_bug.cgi?id=503288#c8 .  Related may be the way this is sneaking past the TCP checksums.

So, I could have completely separate problems going on, or perhaps there's a common root cause, but I don't know enough to figure out which.
Comment 14 Bill McGonigle 2009-08-18 00:51:38 UTC
The P5Q problems seem to have been related to the BIOS undervolting the memory.  According to the ASUS forum, manually setting the voltage higher than spec is required for the board to actually provide the required voltage.
Comment 15 fboiteux 2009-09-02 06:27:48 UTC
  Hello,

I'm also hit by this problem. My computer is an Asus Eeepc 1002HA, with an atl1e Ethernet driver, and a Debian Lenny system with 2.6.30 kernel.

In my case, I can reproduce it [almost] everyday : when I wake up from hibernation (suspend to disk), I launch a firefox session (with a lot of tabs) on another computer via ssh, and the first time, firefox ends with a message :
"Disconnecting: Corrupted MAC on input."
The second try [almost] works everytime…
Comment 16 Reiner Herrmann 2010-10-12 15:23:28 UTC
The same bug is also discussed on https://bugs.launchpad.net/ubuntu/+source/linux/+bug/60764

I can reproduce it on an Asus EeePC 1000HE with atl1e driver on kernel 2.6.35.7.
None of the proposed workarounds is working ("ethtool -K eth0 tso off", "ethtool -K eth0 rx off tx off").
The files are corrupted when transferring with either scp ("Packet corrupt") or nc (different md5sum).

The problem only occurs once until next reboot (or sometimes after wakeup from standby). After it occured, all successive transfers won't get corrupted.
Comment 17 Tomaž Šolc 2010-10-26 09:54:23 UTC
Hi everyone

I've been investigating this bug on an Asus EeePC 901 on kernel 2.6.26.

I'm using two simple programs and netcat to transfer a known data pattern from a desktop computer to the EeePC via a TCP connection. There is a single 100BASE-TX switch between computers. "producer" writes the pattern to stdout. "consumer" reads it from stdin and writes out any blocks that do not match the expected pattern (source code at http://www.tablix.org/~avian/connection_tester/). I'm also monitoring the connection on the Eee end via tcpdump -w.

"consumer" is seeing corruption in the form of one corrupted sequence of 128 bytes. So far all corrupted sequences have the correct lower nibble while the higher nibble appears to come from some other part of the stream (seen offsets 80 and 144 bytes). All corrupted sequences seen start at an offset xxxxx110b. You can see some examples at the URL mentioned above.

The error is always reproducible after a reboot. After one occurrence it does not seem to repeat, even after 10 GB of transferred test data. Reloading the atl1e module does not trigger a repeat. Only seems to happen when the network is saturated. Introducing even a slight delay on the "producer" side doesn't trigger the bug.

So far I haven't seen the corruption with tcpdump. It is possible I've simply missed it because Eee isn't capable of logging the full 100 mbit/s stream without dropping packets.
Comment 18 Martin Buck 2010-10-28 07:58:28 UTC
I can confirm this issue with an Asus Eeepc 1000H (atl1e Ethernet) on kernel 2.6.32 (also tried a selective backport of the 2.6.36 atl1e driver with the same results).

It's reproducible for me after every suspend/resume cycle by running the following command on the Eeepc:
$ ssh OTHERHOST "cat /dev/zero" > /dev/null
Typically, I get a "Corrupted MAC on input" after transferring 2-20MB of data.

After the first occurrence after suspend/resume, I haven't seen this a second time until the next suspend/resume so far.

I can also offer a workaround: Comment out the call to atl1e_rx_checksum() in line 1427 of atl1e_main.c. This ignores the hardware TCP checksum checking and does it in software instead. This will causes the corruption to be detected (and fixed by retransmission) already in the TCP layer, so user space doesn't see corrupted data.

Note that I don't assume that hardware TCP checksum checking on ATL1e is broken. I also don't expect the errors to be actual transmission errors on Ethernet (the error probability should be much lower, especially after passing Ethernet checksum checking). Instead, I assume that the corruption happens during DMA transfer from the ATL1e hardware to RAM and since the software checksum is calculated afterwards, it gets detected there.
Comment 19 Jay Cliburn 2010-10-28 23:39:08 UTC
On Thu, Oct 28, 2010 at 2:58 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=12282
>
>
> Martin Buck <mb-tmp-ohtmvyyn.xreary.bet@gromit.dyndns.org> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |mb-tmp-ohtmvyyn.xreary.bet@
>                   |                            |gromit.dyndns.org
>
>
>
>
> --- Comment #18 from Martin Buck
> <mb-tmp-ohtmvyyn.xreary.bet@gromit.dyndns.org>  2010-10-28 07:58:28 ---
> I can confirm this issue with an Asus Eeepc 1000H (atl1e Ethernet) on kernel
> 2.6.32 (also tried a selective backport of the 2.6.36 atl1e driver with the
> same results).
>
> It's reproducible for me after every suspend/resume cycle by running the
> following command on the Eeepc:
> $ ssh OTHERHOST "cat /dev/zero" > /dev/null
> Typically, I get a "Corrupted MAC on input" after transferring 2-20MB of
> data.
>
> After the first occurrence after suspend/resume, I haven't seen this a second
> time until the next suspend/resume so far.
>
> I can also offer a workaround: Comment out the call to atl1e_rx_checksum() in
> line 1427 of atl1e_main.c. This ignores the hardware TCP checksum checking
> and
> does it in software instead. This will causes the corruption to be detected
> (and fixed by retransmission) already in the TCP layer, so user space doesn't
> see corrupted data.
>
> Note that I don't assume that hardware TCP checksum checking on ATL1e is
> broken. I also don't expect the errors to be actual transmission errors on
> Ethernet (the error probability should be much lower, especially after
> passing
> Ethernet checksum checking). Instead, I assume that the corruption happens
> during DMA transfer from the ATL1e hardware to RAM and since the software
> checksum is calculated afterwards, it gets detected there.
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
>

atl1e is maintained by Atheros directly.  Calling upon Jie Yang to
look at this bug.
Comment 20 Tomaž Šolc 2010-10-29 17:44:32 UTC
I can confirm that suspend/resume also triggers a new, single occurrence of this bug.

One odd thing that may or may not be connected to this bug. In atl1e_clean() (atl1e_main.c:1524), if I add the following debug message:

imr_data = AT_READ_REG(&adapter->hw, REG_IMR);
AT_WRITE_REG(&adapter->hw, REG_IMR, imr_data | ISR_RX_EVENT);

if(imr_data & ISR_RX_EVENT)
        dev_dbg(&pdev->dev, "ISR_RX_EVENT already turned on: %x\n", imr_data);

The "already turned on" message appears irregularly ~4 times per second  when the network interface is saturated, just like when data corruption occurs.
Comment 21 orion 2010-10-29 19:57:37 UTC
I'm just going to add some observations to what has been said in this bug so far.

* It seems most of you agree that the bug only happens once after initial boot-up (or suspend/resume). In the tests I ran in comment #10, this was not the case. Using scp, the corrupted MAC message appeared really quickly after initial boot-up. However, it did sometimes trigger even after that first appearance, although it was more difficult to get it to happen.

* It seems to be the shared opinion that it only occurs on network saturation. However, the tests I ran suggested that wasn't the case. By using a rate-limiting program and scp, I managed to get it to trigger even when the traffic level was quite low. Admittedly however, I can't be sure how the rate-limiting program restricts the traffic, so the transfer rate could be very high for extremely short periods of time.

* It was interesting that I couldn't trigger this bug under Windows using the Asus driver supplied. That suggested that either it's a atl1e driver issue, or that the Windows driver provides some way to checksum the data in software that the atl1e driver doesn't, similar to the workaround Martin suggested in #18.
Comment 22 Martin Buck 2010-11-17 05:56:05 UTC
Additional data point regarding the data corruption issue on atl1e: I ran an
additional software checksum check and dumped those packets where the
hardware said "checksum OK" but software said "checksum wrong". 

Since I sent known data during those tests, it was quite easy to identify
which bytes were wrong: There was always a block of exactly 128 Bytes that
got corrupted. Corruption occured at various offsets within a packet (always
a multiple of 16) and after receiving variable amounts of data (but
usually within the first few MB of data received after a suspend/resume
cycle and only once after each resume). The corrupted data wasn't random,
but contained data from a previous packet.

The dump below contains one example of a corrupted packet. The sending side
repeatedly sent the same 131072-byte blocks containing 16 bit big-endian
words with the numbers 0...65535 via TCP. As one can see, the bytes received
between timestamps 101788.264351 and 101788.264565 are corrupted and contain
data that was already received 0x8c2f-0xc4bf+0x10000=51056 bytes ago.


[101788.262169]  00 23 54 91 14 b1 00 30 48 89 ad 0c 08 00I45 00
[101788.262200]  05 dc f3 26 40 00 40 06 62 92 8b 4f 64 8d 8b 4f
[101788.262231]  64 37X30 39 85 54 8e f3 29 0f 1c 1b f3 2e 80 10
[101788.262262]  00 5b 5e ae 00 00 01 01 08 0a 07 a1 fb 44 01 83
[101788.262292]  25 9d 8a 18 8a 19 8a 1a 8a 1b 8a 1c 8a 1d 8a 1e
[101788.262323]  8a 1f 8a 20 8a 21 8a 22 8a 23 8a 24 8a 25 8a 26
[101788.262353]  8a 27 8a 28 8a 29 8a 2a 8a 2b 8a 2c 8a 2d 8a 2e
[101788.262384]  8a 2f 8a 30 8a 31 8a 32 8a 33 8a 34 8a 35 8a 36
[101788.262424]  8a 37 8a 38 8a 39 8a 3a 8a 3b 8a 3c 8a 3d 8a 3e
[101788.262455]  8a 3f 8a 40 8a 41 8a 42 8a 43 8a 44 8a 45 8a 46
[101788.262485]  8a 47 8a 48 8a 49 8a 4a 8a 4b 8a 4c 8a 4d 8a 4e
[101788.262516]  8a 4f 8a 50 8a 51 8a 52 8a 53 8a 54 8a 55 8a 56
[101788.262546]  8a 57 8a 58 8a 59 8a 5a 8a 5b 8a 5c 8a 5d 8a 5e
[101788.262577]  8a 5f 8a 60 8a 61 8a 62 8a 63 8a 64 8a 65 8a 66
[101788.262608]  8a 67 8a 68 8a 69 8a 6a 8a 6b 8a 6c 8a 6d 8a 6e
[101788.262638]  8a 6f 8a 70 8a 71 8a 72 8a 73 8a 74 8a 75 8a 76
[101788.262669]  8a 77 8a 78 8a 79 8a 7a 8a 7b 8a 7c 8a 7d 8a 7e
[101788.262699]  8a 7f 8a 80 8a 81 8a 82 8a 83 8a 84 8a 85 8a 86
[101788.262730]  8a 87 8a 88 8a 89 8a 8a 8a 8b 8a 8c 8a 8d 8a 8e
[101788.262761]  8a 8f 8a 90 8a 91 8a 92 8a 93 8a 94 8a 95 8a 96
[101788.262791]  8a 97 8a 98 8a 99 8a 9a 8a 9b 8a 9c 8a 9d 8a 9e
[101788.262822]  8a 9f 8a a0 8a a1 8a a2 8a a3 8a a4 8a a5 8a a6
[101788.262852]  8a a7 8a a8 8a a9 8a aa 8a ab 8a ac 8a ad 8a ae
[101788.262883]  8a af 8a b0 8a b1 8a b2 8a b3 8a b4 8a b5 8a b6
[101788.262914]  8a b7 8a b8 8a b9 8a ba 8a bb 8a bc 8a bd 8a be
[101788.262944]  8a bf 8a c0 8a c1 8a c2 8a c3 8a c4 8a c5 8a c6
[101788.262975]  8a c7 8a c8 8a c9 8a ca 8a cb 8a cc 8a cd 8a ce
[101788.263005]  8a cf 8a d0 8a d1 8a d2 8a d3 8a d4 8a d5 8a d6
[101788.263036]  8a d7 8a d8 8a d9 8a da 8a db 8a dc 8a dd 8a de
[101788.263066]  8a df 8a e0 8a e1 8a e2 8a e3 8a e4 8a e5 8a e6
[101788.263097]  8a e7 8a e8 8a e9 8a ea 8a eb 8a ec 8a ed 8a ee
[101788.263128]  8a ef 8a f0 8a f1 8a f2 8a f3 8a f4 8a f5 8a f6
[101788.263158]  8a f7 8a f8 8a f9 8a fa 8a fb 8a fc 8a fd 8a fe
[101788.263189]  8a ff 8b 00 8b 01 8b 02 8b 03 8b 04 8b 05 8b 06
[101788.263219]  8b 07 8b 08 8b 09 8b 0a 8b 0b 8b 0c 8b 0d 8b 0e
[101788.263250]  8b 0f 8b 10 8b 11 8b 12 8b 13 8b 14 8b 15 8b 16
[101788.263280]  8b 17 8b 18 8b 19 8b 1a 8b 1b 8b 1c 8b 1d 8b 1e
[101788.263311]  8b 1f 8b 20 8b 21 8b 22 8b 23 8b 24 8b 25 8b 26
[101788.263341]  8b 27 8b 28 8b 29 8b 2a 8b 2b 8b 2c 8b 2d 8b 2e
[101788.263372]  8b 2f 8b 30 8b 31 8b 32 8b 33 8b 34 8b 35 8b 36
[101788.263403]  8b 37 8b 38 8b 39 8b 3a 8b 3b 8b 3c 8b 3d 8b 3e
[101788.263433]  8b 3f 8b 40 8b 41 8b 42 8b 43 8b 44 8b 45 8b 46
[101788.263464]  8b 47 8b 48 8b 49 8b 4a 8b 4b 8b 4c 8b 4d 8b 4e
[101788.263494]  8b 4f 8b 50 8b 51 8b 52 8b 53 8b 54 8b 55 8b 56
[101788.263525]  8b 57 8b 58 8b 59 8b 5a 8b 5b 8b 5c 8b 5d 8b 5e
[101788.263556]  8b 5f 8b 60 8b 61 8b 62 8b 63 8b 64 8b 65 8b 66
[101788.263586]  8b 67 8b 68 8b 69 8b 6a 8b 6b 8b 6c 8b 6d 8b 6e
[101788.263617]  8b 6f 8b 70 8b 71 8b 72 8b 73 8b 74 8b 75 8b 76
[101788.263647]  8b 77 8b 78 8b 79 8b 7a 8b 7b 8b 7c 8b 7d 8b 7e
[101788.263678]  8b 7f 8b 80 8b 81 8b 82 8b 83 8b 84 8b 85 8b 86
[101788.263708]  8b 87 8b 88 8b 89 8b 8a 8b 8b 8b 8c 8b 8d 8b 8e
[101788.263739]  8b 8f 8b 90 8b 91 8b 92 8b 93 8b 94 8b 95 8b 96
[101788.263770]  8b 97 8b 98 8b 99 8b 9a 8b 9b 8b 9c 8b 9d 8b 9e
[101788.263800]  8b 9f 8b a0 8b a1 8b a2 8b a3 8b a4 8b a5 8b a6
[101788.263831]  8b a7 8b a8 8b a9 8b aa 8b ab 8b ac 8b ad 8b ae
[101788.263861]  8b af 8b b0 8b b1 8b b2 8b b3 8b b4 8b b5 8b b6
[101788.263892]  8b b7 8b b8 8b b9 8b ba 8b bb 8b bc 8b bd 8b be
[101788.263923]  8b bf 8b c0 8b c1 8b c2 8b c3 8b c4 8b c5 8b c6
[101788.263953]  8b c7 8b c8 8b c9 8b ca 8b cb 8b cc 8b cd 8b ce
[101788.263984]  8b cf 8b d0 8b d1 8b d2 8b d3 8b d4 8b d5 8b d6
[101788.264014]  8b d7 8b d8 8b d9 8b da 8b db 8b dc 8b dd 8b de
[101788.264045]  8b df 8b e0 8b e1 8b e2 8b e3 8b e4 8b e5 8b e6
[101788.264076]  8b e7 8b e8 8b e9 8b ea 8b eb 8b ec 8b ed 8b ee
[101788.264106]  8b ef 8b f0 8b f1 8b f2 8b f3 8b f4 8b f5 8b f6
[101788.264137]  8b f7 8b f8 8b f9 8b fa 8b fb 8b fc 8b fd 8b fe
[101788.264167]  8b ff 8c 00 8c 01 8c 02 8c 03 8c 04 8c 05 8c 06
[101788.264198]  8c 07 8c 08 8c 09 8c 0a 8c 0b 8c 0c 8c 0d 8c 0e
[101788.264228]  8c 0f 8c 10 8c 11 8c 12 8c 13 8c 14 8c 15 8c 16
[101788.264259]  8c 17 8c 18 8c 19 8c 1a 8c 1b 8c 1c 8c 1d 8c 1e
[101788.264289]  8c 1f 8c 20 8c 21 8c 22 8c 23 8c 24 8c 25 8c 26
[101788.264320]  8c 27 8c 28 8c 29 8c 2a 8c 2b 8c 2c 8c 2d 8c 2e
XXX Corruption starts here
[101788.264351]  c4 bf c4 c0 c4 c1 c4 c2 c4 c3 c4 c4 c4 c5 c4 c6
[101788.264381]  c4 c7 c4 c8 c4 c9 c4 ca c4 cb c4 cc c4 cd c4 ce
[101788.264412]  c4 cf c4 d0 c4 d1 c4 d2 c4 d3 c4 d4 c4 d5 c4 d6
[101788.264442]  c4 d7 c4 d8 c4 d9 c4 da c4 db c4 dc c4 dd c4 de
[101788.264473]  c4 df c4 e0 c4 e1 c4 e2 c4 e3 c4 e4 c4 e5 c4 e6
[101788.264503]  c4 e7 c4 e8 c4 e9 c4 ea c4 eb c4 ec c4 ed c4 ee
[101788.264534]  c4 ef c4 f0 c4 f1 c4 f2 c4 f3 c4 f4 c4 f5 c4 f6
[101788.264565]  c4 f7 c4 f8 c4 f9 c4 fa c4 fb c4 fc c4 fd c4 fe
XXX Corruption ends here
[101788.264595]  8c 6f 8c 70 8c 71 8c 72 8c 73 8c 74 8c 75 8c 76
[101788.264626]  8c 77 8c 78 8c 79 8c 7a 8c 7b 8c 7c 8c 7d 8c 7e
[101788.264656]  8c 7f 8c 80 8c 81 8c 82 8c 83 8c 84 8c 85 8c 86
[101788.264687]  8c 87 8c 88 8c 89 8c 8a 8c 8b 8c 8c 8c 8d 8c 8e
[101788.264718]  8c 8f 8c 90 8c 91 8c 92 8c 93 8c 94 8c 95 8c 96
[101788.264748]  8c 97 8c 98 8c 99 8c 9a 8c 9b 8c 9c 8c 9d 8c 9e
[101788.264779]  8c 9f 8c a0 8c a1 8c a2 8c a3 8c a4 8c a5 8c a6
[101788.264809]  8c a7 8c a8 8c a9 8c aa 8c ab 8c ac 8c ad 8c ae
[101788.264840]  8c af 8c b0 8c b1 8c b2 8c b3 8c b4 8c b5 8c b6
[101788.264870]  8c b7 8c b8 8c b9 8c ba 8c bb 8c bc 8c bd 8c be
[101788.264901]  8c bf 8c c0 8c c1 8c c2 8c c3 8c c4 8c c5 8c c6
[101788.264932]  8c c7 8c c8 8c c9 8c ca 8c cb 8c cc 8c cd 8c ce
[101788.264962]  8c cf 8c d0 8c d1 8c d2 8c d3 8c d4 8c d5 8c d6
[101788.264993]  8c d7 8c d8 8c d9 8c da 8c db 8c dc 8c dd 8c de
[101788.265023]  8c df 8c e0 8c e1 8c e2 8c e3 8c e4 8c e5 8c e6
[101788.265054]  8c e7 8c e8 8c e9 8c ea 8c eb
Comment 23 Jay Cliburn 2010-11-17 16:03:03 UTC
Jie, are you looking at this atl1e corruption problem?

Jay

On Tue, Nov 16, 2010 at 11:56 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=12282
>
>
>
>
>
> --- Comment #22 from Martin Buck
> <mb-tmp-ohtmvyyn.xreary.bet@gromit.dyndns.org>  2010-11-17 05:56:05 ---
> Additional data point regarding the data corruption issue on atl1e: I ran an
> additional software checksum check and dumped those packets where the
> hardware said "checksum OK" but software said "checksum wrong".
>
> Since I sent known data during those tests, it was quite easy to identify
> which bytes were wrong: There was always a block of exactly 128 Bytes that
> got corrupted. Corruption occured at various offsets within a packet (always
> a multiple of 16) and after receiving variable amounts of data (but
> usually within the first few MB of data received after a suspend/resume
> cycle and only once after each resume). The corrupted data wasn't random,
> but contained data from a previous packet.
>
> The dump below contains one example of a corrupted packet. The sending side
> repeatedly sent the same 131072-byte blocks containing 16 bit big-endian
> words with the numbers 0...65535 via TCP. As one can see, the bytes received
> between timestamps 101788.264351 and 101788.264565 are corrupted and contain
> data that was already received 0x8c2f-0xc4bf+0x10000=51056 bytes ago.
>
>
> [101788.262169]  00 23 54 91 14 b1 00 30 48 89 ad 0c 08 00I45 00
> [101788.262200]  05 dc f3 26 40 00 40 06 62 92 8b 4f 64 8d 8b 4f
> [101788.262231]  64 37X30 39 85 54 8e f3 29 0f 1c 1b f3 2e 80 10
> [101788.262262]  00 5b 5e ae 00 00 01 01 08 0a 07 a1 fb 44 01 83
> [101788.262292]  25 9d 8a 18 8a 19 8a 1a 8a 1b 8a 1c 8a 1d 8a 1e
> [101788.262323]  8a 1f 8a 20 8a 21 8a 22 8a 23 8a 24 8a 25 8a 26
> [101788.262353]  8a 27 8a 28 8a 29 8a 2a 8a 2b 8a 2c 8a 2d 8a 2e
> [101788.262384]  8a 2f 8a 30 8a 31 8a 32 8a 33 8a 34 8a 35 8a 36
> [101788.262424]  8a 37 8a 38 8a 39 8a 3a 8a 3b 8a 3c 8a 3d 8a 3e
> [101788.262455]  8a 3f 8a 40 8a 41 8a 42 8a 43 8a 44 8a 45 8a 46
> [101788.262485]  8a 47 8a 48 8a 49 8a 4a 8a 4b 8a 4c 8a 4d 8a 4e
> [101788.262516]  8a 4f 8a 50 8a 51 8a 52 8a 53 8a 54 8a 55 8a 56
> [101788.262546]  8a 57 8a 58 8a 59 8a 5a 8a 5b 8a 5c 8a 5d 8a 5e
> [101788.262577]  8a 5f 8a 60 8a 61 8a 62 8a 63 8a 64 8a 65 8a 66
> [101788.262608]  8a 67 8a 68 8a 69 8a 6a 8a 6b 8a 6c 8a 6d 8a 6e
> [101788.262638]  8a 6f 8a 70 8a 71 8a 72 8a 73 8a 74 8a 75 8a 76
> [101788.262669]  8a 77 8a 78 8a 79 8a 7a 8a 7b 8a 7c 8a 7d 8a 7e
> [101788.262699]  8a 7f 8a 80 8a 81 8a 82 8a 83 8a 84 8a 85 8a 86
> [101788.262730]  8a 87 8a 88 8a 89 8a 8a 8a 8b 8a 8c 8a 8d 8a 8e
> [101788.262761]  8a 8f 8a 90 8a 91 8a 92 8a 93 8a 94 8a 95 8a 96
> [101788.262791]  8a 97 8a 98 8a 99 8a 9a 8a 9b 8a 9c 8a 9d 8a 9e
> [101788.262822]  8a 9f 8a a0 8a a1 8a a2 8a a3 8a a4 8a a5 8a a6
> [101788.262852]  8a a7 8a a8 8a a9 8a aa 8a ab 8a ac 8a ad 8a ae
> [101788.262883]  8a af 8a b0 8a b1 8a b2 8a b3 8a b4 8a b5 8a b6
> [101788.262914]  8a b7 8a b8 8a b9 8a ba 8a bb 8a bc 8a bd 8a be
> [101788.262944]  8a bf 8a c0 8a c1 8a c2 8a c3 8a c4 8a c5 8a c6
> [101788.262975]  8a c7 8a c8 8a c9 8a ca 8a cb 8a cc 8a cd 8a ce
> [101788.263005]  8a cf 8a d0 8a d1 8a d2 8a d3 8a d4 8a d5 8a d6
> [101788.263036]  8a d7 8a d8 8a d9 8a da 8a db 8a dc 8a dd 8a de
> [101788.263066]  8a df 8a e0 8a e1 8a e2 8a e3 8a e4 8a e5 8a e6
> [101788.263097]  8a e7 8a e8 8a e9 8a ea 8a eb 8a ec 8a ed 8a ee
> [101788.263128]  8a ef 8a f0 8a f1 8a f2 8a f3 8a f4 8a f5 8a f6
> [101788.263158]  8a f7 8a f8 8a f9 8a fa 8a fb 8a fc 8a fd 8a fe
> [101788.263189]  8a ff 8b 00 8b 01 8b 02 8b 03 8b 04 8b 05 8b 06
> [101788.263219]  8b 07 8b 08 8b 09 8b 0a 8b 0b 8b 0c 8b 0d 8b 0e
> [101788.263250]  8b 0f 8b 10 8b 11 8b 12 8b 13 8b 14 8b 15 8b 16
> [101788.263280]  8b 17 8b 18 8b 19 8b 1a 8b 1b 8b 1c 8b 1d 8b 1e
> [101788.263311]  8b 1f 8b 20 8b 21 8b 22 8b 23 8b 24 8b 25 8b 26
> [101788.263341]  8b 27 8b 28 8b 29 8b 2a 8b 2b 8b 2c 8b 2d 8b 2e
> [101788.263372]  8b 2f 8b 30 8b 31 8b 32 8b 33 8b 34 8b 35 8b 36
> [101788.263403]  8b 37 8b 38 8b 39 8b 3a 8b 3b 8b 3c 8b 3d 8b 3e
> [101788.263433]  8b 3f 8b 40 8b 41 8b 42 8b 43 8b 44 8b 45 8b 46
> [101788.263464]  8b 47 8b 48 8b 49 8b 4a 8b 4b 8b 4c 8b 4d 8b 4e
> [101788.263494]  8b 4f 8b 50 8b 51 8b 52 8b 53 8b 54 8b 55 8b 56
> [101788.263525]  8b 57 8b 58 8b 59 8b 5a 8b 5b 8b 5c 8b 5d 8b 5e
> [101788.263556]  8b 5f 8b 60 8b 61 8b 62 8b 63 8b 64 8b 65 8b 66
> [101788.263586]  8b 67 8b 68 8b 69 8b 6a 8b 6b 8b 6c 8b 6d 8b 6e
> [101788.263617]  8b 6f 8b 70 8b 71 8b 72 8b 73 8b 74 8b 75 8b 76
> [101788.263647]  8b 77 8b 78 8b 79 8b 7a 8b 7b 8b 7c 8b 7d 8b 7e
> [101788.263678]  8b 7f 8b 80 8b 81 8b 82 8b 83 8b 84 8b 85 8b 86
> [101788.263708]  8b 87 8b 88 8b 89 8b 8a 8b 8b 8b 8c 8b 8d 8b 8e
> [101788.263739]  8b 8f 8b 90 8b 91 8b 92 8b 93 8b 94 8b 95 8b 96
> [101788.263770]  8b 97 8b 98 8b 99 8b 9a 8b 9b 8b 9c 8b 9d 8b 9e
> [101788.263800]  8b 9f 8b a0 8b a1 8b a2 8b a3 8b a4 8b a5 8b a6
> [101788.263831]  8b a7 8b a8 8b a9 8b aa 8b ab 8b ac 8b ad 8b ae
> [101788.263861]  8b af 8b b0 8b b1 8b b2 8b b3 8b b4 8b b5 8b b6
> [101788.263892]  8b b7 8b b8 8b b9 8b ba 8b bb 8b bc 8b bd 8b be
> [101788.263923]  8b bf 8b c0 8b c1 8b c2 8b c3 8b c4 8b c5 8b c6
> [101788.263953]  8b c7 8b c8 8b c9 8b ca 8b cb 8b cc 8b cd 8b ce
> [101788.263984]  8b cf 8b d0 8b d1 8b d2 8b d3 8b d4 8b d5 8b d6
> [101788.264014]  8b d7 8b d8 8b d9 8b da 8b db 8b dc 8b dd 8b de
> [101788.264045]  8b df 8b e0 8b e1 8b e2 8b e3 8b e4 8b e5 8b e6
> [101788.264076]  8b e7 8b e8 8b e9 8b ea 8b eb 8b ec 8b ed 8b ee
> [101788.264106]  8b ef 8b f0 8b f1 8b f2 8b f3 8b f4 8b f5 8b f6
> [101788.264137]  8b f7 8b f8 8b f9 8b fa 8b fb 8b fc 8b fd 8b fe
> [101788.264167]  8b ff 8c 00 8c 01 8c 02 8c 03 8c 04 8c 05 8c 06
> [101788.264198]  8c 07 8c 08 8c 09 8c 0a 8c 0b 8c 0c 8c 0d 8c 0e
> [101788.264228]  8c 0f 8c 10 8c 11 8c 12 8c 13 8c 14 8c 15 8c 16
> [101788.264259]  8c 17 8c 18 8c 19 8c 1a 8c 1b 8c 1c 8c 1d 8c 1e
> [101788.264289]  8c 1f 8c 20 8c 21 8c 22 8c 23 8c 24 8c 25 8c 26
> [101788.264320]  8c 27 8c 28 8c 29 8c 2a 8c 2b 8c 2c 8c 2d 8c 2e
> XXX Corruption starts here
> [101788.264351]  c4 bf c4 c0 c4 c1 c4 c2 c4 c3 c4 c4 c4 c5 c4 c6
> [101788.264381]  c4 c7 c4 c8 c4 c9 c4 ca c4 cb c4 cc c4 cd c4 ce
> [101788.264412]  c4 cf c4 d0 c4 d1 c4 d2 c4 d3 c4 d4 c4 d5 c4 d6
> [101788.264442]  c4 d7 c4 d8 c4 d9 c4 da c4 db c4 dc c4 dd c4 de
> [101788.264473]  c4 df c4 e0 c4 e1 c4 e2 c4 e3 c4 e4 c4 e5 c4 e6
> [101788.264503]  c4 e7 c4 e8 c4 e9 c4 ea c4 eb c4 ec c4 ed c4 ee
> [101788.264534]  c4 ef c4 f0 c4 f1 c4 f2 c4 f3 c4 f4 c4 f5 c4 f6
> [101788.264565]  c4 f7 c4 f8 c4 f9 c4 fa c4 fb c4 fc c4 fd c4 fe
> XXX Corruption ends here
> [101788.264595]  8c 6f 8c 70 8c 71 8c 72 8c 73 8c 74 8c 75 8c 76
> [101788.264626]  8c 77 8c 78 8c 79 8c 7a 8c 7b 8c 7c 8c 7d 8c 7e
> [101788.264656]  8c 7f 8c 80 8c 81 8c 82 8c 83 8c 84 8c 85 8c 86
> [101788.264687]  8c 87 8c 88 8c 89 8c 8a 8c 8b 8c 8c 8c 8d 8c 8e
> [101788.264718]  8c 8f 8c 90 8c 91 8c 92 8c 93 8c 94 8c 95 8c 96
> [101788.264748]  8c 97 8c 98 8c 99 8c 9a 8c 9b 8c 9c 8c 9d 8c 9e
> [101788.264779]  8c 9f 8c a0 8c a1 8c a2 8c a3 8c a4 8c a5 8c a6
> [101788.264809]  8c a7 8c a8 8c a9 8c aa 8c ab 8c ac 8c ad 8c ae
> [101788.264840]  8c af 8c b0 8c b1 8c b2 8c b3 8c b4 8c b5 8c b6
> [101788.264870]  8c b7 8c b8 8c b9 8c ba 8c bb 8c bc 8c bd 8c be
> [101788.264901]  8c bf 8c c0 8c c1 8c c2 8c c3 8c c4 8c c5 8c c6
> [101788.264932]  8c c7 8c c8 8c c9 8c ca 8c cb 8c cc 8c cd 8c ce
> [101788.264962]  8c cf 8c d0 8c d1 8c d2 8c d3 8c d4 8c d5 8c d6
> [101788.264993]  8c d7 8c d8 8c d9 8c da 8c db 8c dc 8c dd 8c de
> [101788.265023]  8c df 8c e0 8c e1 8c e2 8c e3 8c e4 8c e5 8c e6
> [101788.265054]  8c e7 8c e8 8c e9 8c ea 8c eb
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
>
Comment 24 Bill McGonigle 2010-11-17 16:39:17 UTC
I didn't see Jie on the cc: list (on the address I have for him) and this bugzilla doesn't know that address, so I forwarded him a link to Martin's latest comment.
Comment 25 Jonathan Nieder 2011-08-29 22:22:43 UTC
From <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622259#65>, it seems that the Ethernet adapter still gives corrupted packets after resume in v2.6.39.
Comment 26 Jonathan Nieder 2011-08-29 22:24:43 UTC
Created attachment 70832 [details]
workaround from comment #18
Comment 27 Bill McGonigle 2011-08-30 00:43:52 UTC
Is there a way to implement this workaround for a blacklist of known-problematic hardware?  Software checksumming isn't the end of the world on these classes of devices, and linux ought to favor un-corrupted data over potentially faster but known-bad data.

It's been two years that Atheros has known about this problem, so I suspect it's a problem they can't fix directly, like buggy hardware.  Usually, linux uses blacklists to deal with buggy hardware.

Note You need to log in before you can comment on or make changes to this bug.