Bug 29082

Summary: asix_rx_fixup() bad header length and Bad RX Length in dmesg
Product: Drivers Reporter: Carl Hetherington (cth)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, emilgoode, igor.raits, kernel, lipski.bogdan, lvml, myxal.mxl, slava, windmail01, zarhan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: Output of lsusb -vvxxx
syslog from working session
syslog from non-working session
Test patch
Test patch for preventing package loss on AX88772B.

Description Carl Hetherington 2011-02-13 23:32:35 UTC
Created attachment 47682 [details]
Output of lsusb -vvxxx

Using a Edimax EU-4207 USB ethernet adapter on a HP Probook 4520s.

Output of lsusb -vvxxx attached.

If the adapter is plugged in on boot, the adapter lights up but does not pass network traffic, and repeated "asix_rx_fixup() bad header length" and "asix_rx_fixup() Bad RX Length messages" appear in dmesg.

On unplugging and replugging the adapter, everything works fine.

I have also seen the bad messages start to appear after a period of the card working fine.

I recompiled asix.ko and usbnet.ko with DEBUG=1, and I attach two syslogs; "unhappy" is when the errors are occurring, and "happy" is when they are not.  Both these logs start from insmod asix.ko.
Comment 1 Carl Hetherington 2011-02-13 23:33:08 UTC
Created attachment 47692 [details]
syslog from working session
Comment 2 Carl Hetherington 2011-02-13 23:33:46 UTC
Created attachment 47702 [details]
syslog from non-working session
Comment 3 Carl Hetherington 2011-02-13 23:35:52 UTC
In the logs there are outputs from a few additional printks that I added.  It seems that in the "bad" condition the asix_rx_fixup function continuously receives SKBs of length 2048, as opposed to shorter SKBs of variable length in the "good" condition.
Comment 4 Lutz Vieweg 2012-07-01 18:52:14 UTC
I can confirm this bug to still exist in the current git head, a recent patch posted here http://www.mail-archive.com/stable@vger.kernel.org/msg09600.html does _not_ fix the issue for me (but seems to address a similar issue that happens only with VLAN tagging, while the "Bad RX Length" appear for about 0.6% of packets I send via the AXIS USB-to-EtherNet adapter).

For those who experience this problem with the Asus-supplied AXIS adapter, I have described a work-around here: http://www.linlap.com/wiki/asus+ux32vd#asix_usb-to-ethernet_adapter (but continously using a patched manufacturer driver does not leave me too happy).
Comment 5 Alan 2012-08-15 22:17:43 UTC
Useful to know there is a vendor driver which doesn't go wrong - it may help identify the difference.
Comment 6 Slava Gorbunov 2013-12-19 08:24:31 UTC
I confirm this bug, it is still present in linux-3.11.2, with "ASIX AX88772 USB 2.0 Ethernet" (according to dmesg). The adapter works in both cases - whether it was plugged in at boot time or not, but message "asix_rx_fixup() Bad Header Length 0xf0823dfb, offset 16" is periodically logged, especially when moderate traffic (about several mbytes/sec) passes through it .
Comment 7 Slava Gorbunov 2013-12-19 18:28:45 UTC
I've noticed interesting peculiarity: the 'Bad header length' message is logged very often (one time per several seconds) during uploading large file through fuse/sshfs, but not logged at all (or logged very infrequently - I didn't observe it for at least 4 minutes) when piping /dev/urandom through netcat. So I suppose that the problem is triggered by packets of specific size (or by sequence of packets with specific sizes).
Comment 8 Slava Gorbunov 2013-12-19 18:52:27 UTC
I've finally discovered the easy way to reproduce this bug.

ping -s 1965 $HOSTNAME

causes repeated messages

Dec 19 22:31:30 fileserver kernel: asix 1-1:1.0 eth1: asix_rx_fixup() Bad Header Length 0xd3c8e7c3, offset 6

and no icmp replies from host being pinged.

Both hosts are connected to 100mbit ethernet with standard mtu=1500, so this packet is splitted into two packets with sizes 1514 and 527. 'ping -s 485', which produces one packet of size 527, doesn't trigger this bug; 'ping -s 1964' (which emits two packets 1514 and 526 bytes long) doesn't trigger it too. Two packets of legth 1514 and 527 generated by successive ping commands also doesn't trigger it (perhaps because they have larger time interval between them).

So the bug is triggered by two successive packets of sepcific sizes (specifically 1514 then (any from 527 to 538) bytes, perhaps there are other combinations too) recieved in a short time (perhaps sufficiently short to be buffered together).

I hope my investigation will be useful.
Comment 9 Slava Gorbunov 2013-12-21 00:23:15 UTC
I've found the reason of this problem. The device may send several ethernet packets to the host at once, and the total size of such a pack is limited by 2048 bytes. But it is the summary size of raw ethernet frames; when the device sends them to the host, each ethernet frame is prefixed by 32bit header (with frame size and some kind of checksum for this size). As a result, the actual size of data being transfered may exceed dev->rx_urb_size, which is set to 2048. I don't understand what exactly happens in this case (the data is either truncated or, even worse, written beyond the end of the buffer), but, anyway, asix_rx_fixup doesn't see the rest of the data, tries to read it from the begining of next 2048 bytes received and "lose sync".

The increase of rx_urb_size to 4096 solves the problem (at least for me):

diff -ur drivers/net/usb.orig/asix_devices.c drivers/net/usb/asix_devices.c
--- drivers/net/usb.orig/asix_devices.c	2013-12-21 02:25:44.372538206 +0400
+++ drivers/net/usb/asix_devices.c	2013-12-21 03:50:14.406536611 +0400
@@ -488,7 +488,7 @@
 	if (dev->driver_info->flags & FLAG_FRAMING_AX) {
 		/* hard_mtu  is still the default - the device does not support
 		   jumbo eth frames */
-		dev->rx_urb_size = 2048;
+		dev->rx_urb_size = 4096;
 	}
 
 	dev->driver_priv = kzalloc(sizeof(struct asix_common_private), GFP_KERNEL);

The exact rx_urb_size could be selected more precisely - 4096 is more than enough. Also I'm not sure that this solution is complete - rx_urb_size seems to be changed in other places (for example, in usbnet.c when changing MTU).
Comment 10 Alan 2013-12-21 00:35:31 UTC
I wonder if the MFB_2048 etc values are '2048 bytes + a header' in the devicce itself ?

If you change  ax88178_set_mfb to do

dev->rx_urb_size +=32;

just before it does rxctl=asix_read_rx_ctl(dev)

and change the line you already did to 2048 + 32

does it then behave - and do the combinations 32 bytes higher still work ?

Alan
Comment 11 Alan 2013-12-23 11:43:19 UTC
*** Bug 30952 has been marked as a duplicate of this bug. ***
Comment 12 Alan 2013-12-23 11:43:37 UTC
*** Bug 16831 has been marked as a duplicate of this bug. ***
Comment 13 Alan 2014-01-29 14:48:32 UTC
*** Bug 69671 has been marked as a duplicate of this bug. ***
Comment 14 Igor Raits 2014-01-29 15:28:30 UTC
Ok. I'll test it after 2h..
Comment 15 Igor Raits 2014-01-29 19:24:31 UTC
(In reply to Alan from comment #10)
> I wonder if the MFB_2048 etc values are '2048 bytes + a header' in the
> devicce itself ?
> 
> If you change  ax88178_set_mfb to do
> 
> dev->rx_urb_size +=32;
> 
> just before it does rxctl=asix_read_rx_ctl(dev)
> 
> and change the line you already did to 2048 + 32
> 
> does it then behave - and do the combinations 32 bytes higher still work ?
> 
> Alan
Well. Thank you. I've testing this patch:

diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index 9765a7d..6d60fc2 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -487,7 +487,7 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
 	if (dev->driver_info->flags & FLAG_FRAMING_AX) {
 		/* hard_mtu  is still the default - the device does not support
 		   jumbo eth frames */
-		dev->rx_urb_size = 2048;
+		dev->rx_urb_size = 2048 + 32;
 	}
 
 	dev->driver_priv = kzalloc(sizeof(struct asix_common_private), GFP_KERNEL);
@@ -746,6 +746,7 @@ static void ax88178_set_mfb(struct usbnet *dev)
 		mfb = AX_RX_CTL_MFB_16384;
 	}
 
+	dev->rx_urb_size += 32;
 	rxctl = asix_read_rx_ctl(dev);
 	asix_write_rx_ctl(dev, (rxctl & ~AX_RX_CTL_MFB_16384) | mfb);

And it works for me (basic tests). I will do more testing tomorrow, but I think it will work.

Could you prepare correctly patch for commiting to the git ?
Feel free to add: Tested-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Comment 16 Michal Zatloukal 2014-01-29 19:47:37 UTC
If you have another machine on the same LAN, I suggest testing with "endless zeros":
machine A: nc -l 4444 </dev/zero >/dev/null
machine B: nc <machineA-IP> 4444 </dev/zero >/dev/null

Back when I had the ASIX NIC this would trigger the bug within 30 seconds.

Watch the bandwidth with ifstat or similar.
Comment 17 Alan 2014-01-29 23:46:46 UTC
If it passes testing I will do so
Comment 18 Igor Raits 2014-01-30 06:10:17 UTC
mh. sad news.

Jan 30 10:09:08 X1Carbon.localdomain kernel: asix 1-1.2:1.0 enp0s26u1u2: asix_rx_fixup() Bad Header Length 0x9f2750ba, offset 4
Jan 30 10:09:09 X1Carbon.localdomain kernel: asix 1-1.2:1.0 enp0s26u1u2: asix_rx_fixup() Bad Header Length 0xc63f9f61, offset 4
Jan 30 10:09:09 X1Carbon.localdomain kernel: asix 1-1.2:1.0 enp0s26u1u2: asix_rx_fixup() Bad Header Length 0x8d231476, offset 4
Jan 30 10:09:09 X1Carbon.localdomain kernel: asix 1-1.2:1.0 enp0s26u1u2: asix_rx_fixup() Bad Header Length 0x2b0add58, offset 4
Jan 30 10:09:10 X1Carbon.localdomain kernel: asix 1-1.2:1.0 enp0s26u1u2: asix_rx_fixup() Bad Header Length 0xe9eab2bc, offset 4
Jan 30 10:09:10 X1Carbon.localdomain kernel: asix 1-1.2:1.0 enp0s26u1u2: asix_rx_fixup() Bad Header Length 0xe4f91d4d, offset 4
Jan 30 10:09:11 X1Carbon.localdomain kernel: asix 1-1.2:1.0 enp0s26u1u2: asix_rx_fixup() Bad Header Length 0x7ba31fe0, offset 4
Comment 19 Alan 2014-01-30 14:47:00 UTC
Digging in the info I have can you reverse any other changes and apply the patch I'll attach in a moment and see if that helps. If not can you retry after changing

+       return asix_rx_fixup_internal(dev, skb, rx, 3 * sizeof(u16), 3);

in the patch to 

+       return asix_rx_fixup_internal(dev, skb, rx, 2 * sizeof(u16), 3);
Comment 20 Alan 2014-01-30 14:47:38 UTC
Created attachment 123871 [details]
Test patch
Comment 21 Igor Raits 2014-01-31 09:43:30 UTC
(In reply to Alan from comment #20)
> Created attachment 123871 [details]
> Test patch

I'm testing this patch w/o/ previous fix. Still buggy.

Testing 

+       return asix_rx_fixup_internal(dev, skb, rx, 2 * sizeof(u16), 3);
Comment 22 Igor Raits 2014-01-31 11:01:28 UTC
Yes.
With
return asix_rx_fixup_internal(dev, skb, rx, 2 * sizeof(u16), 3);

It works for me. BUt getting weird msgs from NM.

Jan 31 14:57:54 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) complete.
Jan 31 14:57:58 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) scheduled...
Jan 31 14:57:58 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) started...
Jan 31 14:57:58 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) complete.
Jan 31 14:58:51 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) scheduled...
Jan 31 14:58:51 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) started...
Jan 31 14:58:51 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) complete.
Jan 31 14:58:55 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) scheduled...
Jan 31 14:58:55 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) started...
Jan 31 14:58:55 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) complete.
Jan 31 14:58:59 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) scheduled...
Jan 31 14:58:59 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) started...
Jan 31 14:58:59 X1Carbon.localdomain NetworkManager[839]: <info> Activation (enp0s20u2) Stage 5 of 5 (IPv6 Commit) complete.
Comment 23 Igor Raits 2014-01-31 11:40:36 UTC
Oh. bad news. after some time got again err..

[ 2807.803177] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0x66db500, offset 8
[ 2829.395928] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0x1b72148, offset 16
[ 2834.755826] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0xf738066d, offset 10
[ 2836.130169] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0xaf3f9d21, offset 16
[ 3078.781824] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0xf738066d, offset 10
[ 3102.137216] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0xf738066d, offset 10
[ 3110.797625] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0xf738066d, offset 10
[ 3119.510456] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0xf738066d, offset 10
[ 3403.743047] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0x66db500, offset 8
[ 3409.607052] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0xffffffff, offset 10
[ 3745.013895] asix 3-2:1.0 enp0s20u2: asix_rx_fixup() Bad Header Length 0x1b00f738, offset 12
Comment 24 Emil Goode 2014-02-04 19:11:59 UTC
Created attachment 124481 [details]
Test patch for preventing package loss on AX88772B.
Comment 25 Emil Goode 2014-02-04 19:18:34 UTC
I also noticed this bug and I have found that the problem is that the usbnet
module discards buffers when skb->len is less than dev->net->hard_header_len.
Since the AX88772B occasionally places more than one packet in a single urb,
causing the amount of data to exceed the dev->rx_urb_size, this causes the
next urb to contain the remaining headerless data of the last packet in the
previous urb. When the remaining data is of less than hard_header_len number
of octets the buffer is discarded.

The default value of hard_header_len is 14 octets, which causes the following
behavior:

ping -c 1 -s 1976 $HOSTNAME

In the above case the we get two packets of the sizes 1514+4 and 538+4 in
one urb. This is a total of 2060 octets which leaves 12 remaining octets
to be sent in the next urb since rx_urb_size is 2048.
But instead the next urb contains a new packet at offset 0, which means we
lost the remaining data of the last package in the previous urb.

ping -c 1 -s 1977 $HOSTNAME

In this case the second urb contains 14 octets of header less data.
We sent only one more octet so it should be 13, but the rx_fixup function
rounds it to an even number.

The following is what the code in the rx_complete function in usbnet.c does:

	if (skb->len < dev->net->hard_header_len) {
		state = rx_cleanup;
		^^^^^^^^^^
		dev->net->stats.rx_errors++;
		dev->net->stats.rx_length_errors++;
	}

	...

	/* stop rx if packet error rate is high */
        if (++dev->pkt_cnt > 30) {
                dev->pkt_cnt = 0;
                dev->pkt_err = 0;
        } else {
                if (state == rx_cleanup)
                        dev->pkt_err++;
			^^^^^^^^^^
                if (dev->pkt_err > 20)
                        set_bit(EVENT_RX_KILL, &dev->flags);
        }

The quick fix is to set dev->net->hard_header_len to 0, you can test the
temporary patch in the previous post. However I don't think this is a pretty solution as 
I believe the hard_header_len value is also used when transmitting packages.
Perhaps there should be a way to indicate to the usbnet module that it should
make an exception to the hard_header_len test in a situation like this?

Best regards,

Emil Goode
Comment 26 Igor Raits 2014-02-05 08:25:09 UTC
(In reply to Emil Goode from comment #24)
> Created attachment 124481 [details]
> Test patch for preventing package loss on AX88772B.

Yep. This patch fixed my bug.
Comment 27 Emil Goode 2014-02-06 09:55:18 UTC
Ok, I will send a patch with this fix.
Comment 28 Igor Raits 2014-02-16 20:09:34 UTC
any news here ? This is really uncomfortably...
Comment 29 Emil Goode 2014-02-17 11:22:46 UTC
I have sent the below patch instead to fix this problem, it's waiting for testing/maintainer review.

http://marc.info/?l=linux-kernel&m=139231010015039&w=2
Comment 30 Igor Raits 2014-02-17 13:14:41 UTC
(In reply to Emil Goode from comment #29)
> I have sent the below patch instead to fix this problem, it's waiting for
> testing/maintainer review.
> 
> http://marc.info/?l=linux-kernel&m=139231010015039&w=2

Should I test something ?
Comment 31 Emil Goode 2014-02-17 13:52:04 UTC
I have tested it on a AX88772B, if you have a different device that would be great.
Comment 32 Igor Raits 2014-02-17 14:13:46 UTC
I have interesting device. https://bugzilla.kernel.org/attachment.cgi?id=123781

I will test patch.
Comment 33 Emil Goode 2014-02-17 20:30:35 UTC
The patch has been applied and queued for stable kernels.

commit 9e88626331cf11d45adaf32aadd1f245dcfcdf05
Author: Emil Goode <emilgoode@gmail.com>
Date:   Wed Feb 12 17:37:53 2014 +0100

    usbnet: remove generic hard_header_len check
Comment 34 Bob Ren 2014-02-19 14:41:43 UTC
(In reply to Emil Goode from comment #31)
> I have tested it on a AX88772B, if you have a different device that would be
> great.

I'm a little confused. Isn't ax88179_178a.c supposed to be the driver for ax88179 or ax88178a? Can it work with ax88772b?
Comment 35 Emil Goode 2014-02-20 14:59:17 UTC
The AX88772B uses the asix module. There is a separate minidriver for the AX88179 and AX88178A.
Comment 36 Igor Raits 2014-03-01 15:13:53 UTC
Let's close this bug!

Emil, thank you for patch!