Bug 45071

Summary: r8712u wireless card fails to connect properly
Product: Drivers Reporter: Pierpaolo Valerio (gondsman)
Component: StagingAssignee: drivers_staging (drivers_staging)
Status: RESOLVED CODE_FIX    
Severity: high CC: alan, florian, gondsman, Larry.Finger, linville, philippuryear
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.4 - 3.5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Patch to reduce the allocated buffer size from the ridiculous value of 40720 to 9100
Patch to fix SSL problem.

Description Pierpaolo Valerio 2012-07-23 11:52:11 UTC
I use a D-link DWA-131 wireless stick with the r8712u driver. I have problems whenever I'm connecting to https sites (both with browsers or cli programs). Even if I try to download something with wget from a https website it fails repeatedly giving the error "SSL3_GET_RECORD: decryption failed or bad record mac". It is definitely a probelm with the r8712u module, as using wired network works fine and also using the lts kernel provided by my distribution (Archlinux, 3.0 branch) doesn't show any problem. Logs don't show anything wrong with the firmware loading or any other message related to this module.
The bug is there at least for kernel versions 3.4 and 3.5.
Let me know if I can provide any additional information.
Comment 1 Larry Finger 2012-07-23 16:42:35 UTC
I have been able to get this bug to occur, but it is not consistent. Using x86_64 architecture with openSUSE 12.1, it worked for kernel 3.3.0 and failed for kernel 3.4.0. Using openSUSE 12.2-RC1, 3.4.0 also failed, but 3.5-rc7 works. On a different x86_64 machine, 3.4 worked.

I have started a bisection between 3.5 and 3.3 to see if I can find the problem.
Comment 2 Larry Finger 2012-07-23 19:37:03 UTC
Bisection was a bust. All the generated kernels worked.

Please provide the results for:

md5sum /lib/firmware/rtlwifi/rtl8712u.bin
ls -l  /lib/firmware/rtlwifi/rtl8712u.bin
Comment 3 Pierpaolo Valerio 2012-07-23 20:35:51 UTC
$ md5sum /lib/firmware/rtlwifi/rtl8712u.bin
8e6396b5844a3e279ae8679555dec3f0  /lib/firmware/rtlwifi/rtl8712u.bin

$ ls -l  /lib/firmware/rtlwifi/rtl8712u.bin
-rw-r--r-- 1 root root 129304 Jun 15 04:10 /lib/firmware/rtlwifi/rtl8712u.bin

I have to say that the behaviour is somewhat inconsistent here too, meaning that from time to time it manages to connect without hitches, but the vast majority of times (19 out of 20 or so) it will fail. The tangible result is that for example the chat part of google+ is constantly acting up or gmail can't download new messages.
Comment 4 Larry Finger 2012-07-23 21:16:28 UTC
Your firmware is OK.

I was using this bugzilla entry to test. Unfortunately, it seems to work more often than it fails here. If you can provide me with the wget command that fails, perhaps I will be able to give it a more severe test. I still have the intermediate kernels from the bisection.
Comment 5 Pierpaolo Valerio 2012-07-23 23:54:23 UTC
Thanks for the replies.
The command I use to test is
wget https://spideroak.com/directdownload?platform=centos&arch=x86_64
It fails (and then retries) every couple of seconds on kernel 3.4 and 3.5rc7 (I'll try to build kernel 3.5 again later), while it works 100% of the times with kernel 3.0.
Comment 6 Larry Finger 2012-07-24 02:57:05 UTC
I wrote a script to test the transfers:

#!/bin/sh
for i in 1 2 3 4 5 6 7 8 9 10 ; do
echo $i
wget https://spideroak.com/directdownload?platform=centos&arch=x86_64 
sleep 150
done


I have assumed that if the system survives all 10 of these transfers, then it is OK. Retesting my bisection involving only this driver resulted in the "bad" commit being the addition of new USB IDs. As this is absurd, it is clear that some other change in the kernel is making r8712u fail.

I have started a bisection of the whole kernel to find where that problem occurs.
Comment 7 Larry Finger 2012-07-26 19:16:41 UTC
This problem was bisected to commit c862815, which is entitled "tcp: reduce out_of_order memory use". As other network drivers are OK with this code, it is obvious that the problem is in r8712u. I posted the finding on LKML and to the authors of the patch. I hope I get some ideas from them.
Comment 8 Larry Finger 2012-08-22 21:38:47 UTC
Created attachment 78241 [details]
Patch to reduce the allocated buffer size from the ridiculous value of 40720 to 9100

My posting of this problem got no suggestions; however, when I bisected a problem with different drivers to the same commit and posted that question, I got some suggestions. This patch appears to fix it for me. Does it help your situation?
Comment 9 Pierpaolo Valerio 2012-08-23 09:23:28 UTC
Thank you very much, I'll get back to you in a couple of days, when I'll have time to compile the patched kernel.
Comment 10 Larry Finger 2012-08-23 21:00:04 UTC
Do not bother. I got a false positive when testing the patch. Sometimes, 10 tries are not enough. I'll let you know when/if I get a patch that actually works.
Comment 11 Larry Finger 2012-08-31 22:56:10 UTC
The problem is something wrong with the x86_64 version. If you run with a 32-bit system, it is fine. I have stared at the code for about a week without seeing anything wrong.
Comment 12 Pierpaolo Valerio 2012-09-03 13:33:17 UTC
I don't know if it can help in any way (I wish I could code...) but I noticed that with my 3.0 kernel with working drivers iwconfig reports the bitrate of my connection (in perfect conditions) as 144Mb/s, while with the 3.5 kernel it says 150Mb/s. Manually changing the speed or disabling 802.11n support doesn't make a different for the bug, though.
Comment 13 Larry Finger 2012-09-03 15:56:43 UTC
This speed change is totally irrelevant, The problem is some place in the receive patch, and is found only on 64-bit systems. The problem is not seen when using a 32-bit distro.

Compounding the problem is that it is intermittent as it appears to require the out-of-order TCP fragments. As I said in comment #11, I have already spent a lot of time on this problem without getting anywhere. At the moment, I am trying to clear all the work that backed up then. When that is finished, I will look at this problem again.
Comment 14 Larry Finger 2012-09-10 18:32:38 UTC
Created attachment 79601 [details]
Patch to fix SSL problem.

This patch by Eric Dumazet fixes the SSL problem for me.

A second bug has been found recently. For some, but not all devices, the latest firmware from Realtek leads to disconnects. A request to revert to an older version has been submitted to the linux-firmware repo.
Comment 15 Pierpaolo Valerio 2012-09-17 00:44:51 UTC
I can confirm that the patch seems to work just fine. In what kernel version can I expect it to be merged?
In any case, thank you very much for the help, I'm telling all my colleagues how awesome the kernel developers are! :)
Comment 16 Larry Finger 2012-09-17 01:28:49 UTC
It has been merged into kernel 3.6-rc6. As it was posted with a Cc to Stable, it will soon be in all the stable versions. I just got E-mail that it has been accepted into what will be 3.2.30. I expect that it will soon be in 3.5.5, 3.4.12, and 3.0.44.

Kernel devs try to please. :)
Comment 17 Florian Mickler 2012-09-19 22:15:02 UTC
A patch referencing this bug report has been merged in Linux v3.6-rc6:

commit abf02cfc179bb4bd30d05f582d61b3b8f429b813
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Sep 10 21:22:11 2012 +0200

    staging: r8712u: fix bug in r8712_recv_indicatepkt()