The kernel oopsed like shown in attached picture when: Trying to find optimal AP settings I changed the settings in AP. I had just moments before changed the type of the network from 802.11n to 802.11g. Then I ordered NetworkManager to reconnect and Oops happened. At that time I went to sleep. In the morning I booted three times to 3.4-rc1 with every time ending up getting a kernel oops. The oopses have been pictured in other attached pictures. After third time I chose previous version which in my case was 3.3.0-gentoo and with which I didn't encounter any oopses. The performance of the network card is still lacking, but at least it does not oops anymore. More about my struggles with rel8192ce and additional details are available at https://forums.gentoo.org/viewtopic-t-918728.html .
Created attachment 72818 [details] Kernel oops message when connecting to 802.11g network. I had just changed the network type from 802.11n to 802.11g and ordered NetworkManager to reconnect.
Created attachment 72819 [details] 1st boot when the network card is probably attaching to wlan network.
Created attachment 72820 [details] 2nd boot similar to the first one Note that after the first oops in the evening I hadn't touched the settings in the AP.
Created attachment 72821 [details] 3rd boot similar to the first and second
Created attachment 72824 [details] Trial patch to fix oops This patch should prevent the oops; however, it does not fix the underlying cause. I am hoping that making this change will help in gathering more information about what is happening. I am unable to reproduce this problem when connecting to either an 802.11n or 802.11g AP. Does your system have CRDA installed?
[I] net-wireless/crda Available versions: 1.0.1-r1 ~1.1.2-r3 Installed versions: 1.0.1-r1(22.13.52 02.04.2012) I'll apply the patch and test the patched kernel tonight.
The patch fixed the oops. I tried again without and with patch and without patch encountered the same error again. In the meantime I upgraded the NetworkManager to version 0.9.4.0, but it didn't help. Now that can boot without kernel oops I began getting following attached messages to dmesg.
Created attachment 72834 [details] dmesg containing the stacktraces This is from recompiled kernel. I added some debug-settings to the new kernel.
Created attachment 72835 [details] Kernel config. This kernel config was used for compiling the patched kernel.
Testing speedtest.net gave me following average results: Ping 30ms, up/down 6Mbps/5.5Mbps. That is an improvement from 3.3.0 where I usually got ~35-40ms and less 3Mbps. I tried moving the computer around so the heat radiator behind computer woulnd't interfere. Moving the case more than 0.5m away from the original location which was quite close to the radiator didn't improve the performance. Radiator is thin metal radiator containing circulated water. It is located ~15cm away from the antennas. I tested the 802.11b/g/n connection to another AP (D-Link) but couldn't move any data (latency error on speedtest.net). Testing to original AP but with 802.11n network 144Mb/s the stacktraces vanished and speed went down to ping 32ms, down 1.1Mbps, up 0.6Mbps.
Created attachment 72836 [details] dmesg containing some 802.11g stacktraces and then the 802.11n messages. The module options were: options rtl8192ce swenc=1 debug=3 ips=0 The Ap configuration was 802.11n, 20Mhz.
Created attachment 72837 [details] In this dmesg part I configured the AP to 802.11n, 40Mhz, Control Sideband: Upper In this dmesg part I configured the AP to 802.11n, 40Mhz, Control Sideband: Upper, channel 13 which is least crowded. speedtest.net gives me ping 31ms, down 2Mbps, up 3 Mbps.
Your dmesg output wrapped around and obliterated the part where the failures first started. You need to capture the output earlier, or change the size of the buffer. That is controlled by CONFIG_LOG_BUF_SHIFT in .config. You currently have a value of 18 - please increase it to 20. Did you notice that once the driver called CRDA, the WARN messages stopped? I suspect that if you set CONFIG_CFG80211_INTERNAL_REGDB in .config, then the messages will never occur. Your speedtest.net rates are measuring the speed of your broadband connection. When I test on a local 802.11g network, I get ~18 Mbps and about 40 Mbps on an 802.11n link.
My broadband connection is 100Mbps down and 10Mbps up and with wired connection I get actual speeds of around 85-90Mbps down and 10-11Mbps up. The speed of broadband connection is not an issue here. And it is the reason why I'm not satisfied with 802.11g.
OK. With 802.11g, the most you will get with any driver is about 27 Mbps. With the kernel version of rtl8192cu on an 802.11n connection, I only get about 20 TX and RX. If you want the best performance, then you should get driver RTL8192CU_8188CUS_8188CE-VAU_linux_v3.1.2590.20110922 from the Realtek web site. It does up to 68 RX and 45 TX. My time has been spent making the in-kernel drivers be stable. When we get close to that condition, then I will be able to concentrate on the performance.
Created attachment 72844 [details] Connection to 802.11g with stacktraces I recompiled the kernel with following changes: turned off CONFIG_CFG80211_WEXT turned on CONFIG_EXPERT and CONFIG_CFG80211_INTERNAL_REGDB increased CONFIG_LOG_BUF_SHIFT to 20 Rebooted with blacklisted rtl8192ce to prevent its loading. Running "modprobe -v rtl8192ce && /etc/init.d/NetworkManager restart" so that NM detects and connects to the new network card caused my log files to overflow on disk so I a matter of few seconds my disklog of 5Mb got filled with stacktraces. rmmodding rtl8291ce stopped the flow. The second time I modprobed first without restarting NM and than took care to rmmod the driver as soon as I saw the first stacktraces. Modprobing happens around 814.606881 and NM restart around ~888-890. rmmodding happens just before the log-file ends. At this point I don't care that much about the performance. The stability is much more important.
Attaching to 802.11n network didn't produce any stacktraces. I connected the both wlan cards to the same AP which is running b/g/n -mode and went around iw tools looking at settings. At that time I saw the following: # iw dev wlan0 station dump Station 1c:af:xx:xx:xx:xx (on wlan0) inactive time: 976 ms rx bytes: 717993 rx packets: 4736 tx bytes: 81747 tx packets: 695 tx retries: 61 tx failed: 3 signal: 39 dBm signal avg: -48 dBm tx bitrate: 48.0 MBit/s authorized: yes authenticated: yes preamble: long WMM/WME: no MFP: no TDLS peer: no xxxx drivers # iw dev wlan1 station dump Station 1c:af:xx:xx:xx:xx (on wlan1) inactive time: 8892 ms rx bytes: 25352 rx packets: 204 tx bytes: 2684 tx packets: 26 tx retries: 0 tx failed: 0 signal: -54 dBm signal avg: -53 dBm tx bitrate: 300.0 MBit/s MCS 15 40Mhz short GI authorized: yes authenticated: yes preamble: long WMM/WME: yes MFP: no TDLS peer: no Should the signal really be negative? on wlan0 the signal avg varies wildly between negative and positive, wlan1 is all the time at -53 dBm.
Yes, the signal should be negative, ar a small positive number. The +39 dBm is clearly bogus. The warning from ieee80211_get_tx_rate(), which is what you are seeing from rtl8192ce, is being discussed on the linux-wireless mailing list. I have submitted a patch to only issue it once. I will attach that one next.
Created attachment 72846 [details] Patch to limit WARN statements from mac80211 This patch will prevent the warning in ieee80211_get_tx_rate() from spamming the logs by converting from WARN_ON to WARN_ON_ONCE.
Created attachment 72848 [details] The number of stacktraces went down after the patch. Patch improved the current situation and now I get only few stacktraces. Speedtest.net scored me 8Mbps/8Mbps at best with 802.11g which is one of the best results so far. The bogus dbm-number was created by the zd1211rw -driver. I have to investigate it later. 802.11n scores slightly lower performance, but does not show any regressions. No stacktraces there either. I still get a number of CRDA messages: "Apr 07 23:36:12 [kernel] [14654.742258] cfg80211: Calling CRDA for country: FI Apr 07 23:36:12 [kernel] [14654.754289] cfg80211: Regulatory domain changed to country: FI ........." I just installed netperf and I'm now investigating it. I'll setup a testing configuration later on.
Interesting problem. I don't know if this worked before with 802.11g: 802.11n: # iw wlan1 station dump Station cc:5d:xx:xx:xx:xx (on wlan1) inactive time: 1089 ms rx bytes: 9985583 rx packets: 14523 tx bytes: 10868372 tx packets: 11401 tx retries: 0 tx failed: 0 signal: -50 dBm signal avg: -49 dBm§ tx bitrate: 300.0 MBit/s MCS 15 40Mhz short GI authorized: yes authenticated: yes preamble: long WMM/WME: yes MFP: no TDLS peer: no 802.11g: # iw wlan1 station dump failed to parse nested attributes!
After quick test of the rtl8192ce module options with 802.11n it seems that "swenc=1" is necessary for successful association. Now current working options are: options cfg80211 ieee80211_regdom=FI options rtl8192ce swenc=1 debug=3
Strange - hardware encryption (default value of 0 for swenc) is OK here. WARNING: at net/mac80211/tx.c:770 invoke_tx_handlers+0x85c/0x1520 [mac80211]() is due to a broken rate control setup. WARNING: at net/mac80211/tx.c:55 invoke_tx_handlers+0x1424/0x1520 [mac80211]() is also caused by a bad rate control setup. WARNING: at include/net/mac80211.h:1330 rtl_get_tcb_desc+0x3d8/0x5b0 [rtlwifi]() is the one that was just patched - also rate control. Are the first two coming from rtlwifi or zd1211rw? I cannot tell from what is posted. What kernel are you using? I couldn't find it in any of the material. If possible, could you clone the git tree from git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-testing.git That has all the latest patches. With it and rtl8192ce, I get up to 50 Mbps RX and 38 Mbps TX. If not then I will try to get you a patch to upgrade whatever kernel you are using to mine. Those performance numbers are higher than I posted earlier because I thought you had a USB device.
I have been using 3.4-rc1 with the null-pointer exception fix and WARN_ON_ONCE fix. Network mode is 802.11n, 40Mhz, control sideband Upped and channel 13. Module options used were swenc=1 debug=3. I unplugged the zd1211rw - card (wlan0) after the rtl8192ce began working adequately so the following logs contain only lines related to wlan1/rtl8192ce. My computer has been up now four (4) days and has lost connectivity twice. Both times I rmmodded and modprobed the rtl8192ce after which I regained connectivity. The speed of the connection has varied somewhat - from really sluggish to ~7Mbps up/down. Occasionally NetworkManager reconnects and asks for wlan password even though it remembers it. I have the logs stored, but I don't have just now time to go through and process those. I'm now going through the process of cloning and building the wireless-testing after which I'll reboot and remove all module options. Does wireless-testing -git contain the WARN_ON_ONCE-fix or/and the null-pointer fix? I also have the 3.4-rc2 ready for testing, but haven't yet booted it.
Wireless-testing has both patches included. If 3.4-rc2-wl shows the warning, please post the entire dmesg output. I still do not know why the rate control routines are being setup wrong.
Created attachment 72901 [details] dmesg from 3.4.0-rc2-wl Kernel: 3.4.0-rc2-wl Module options: options cfg80211 ieee80211_regdom=FI rtl8192ce: defaults (none set) First connected to 802.11n, 40Mhz, Control Sideband Upper, Channel 13, WPA2-PSK AES Apr 12 20:30:47 [kernel] [42875.488085] changed the same AP to 802.11g, Channel 13. The stacktraces appear only on 802.11g network. Speeds: (speedtest, still no netperf) 802.11n 5Mbps down, 10Mbps up 802.11g 10Mbps down, 10Mbps up
Created attachment 73192 [details] dmesg from 3.4.0-rc5-wl # iw wlan1 station dump Station cc:5d:4e:7a:84:e7 (on wlan1) inactive time: 34283 ms rx bytes: 93710 rx packets: 553 tx bytes: 24396 tx packets: 129 tx retries: 0 tx failed: 0 signal: -50 dBm signal avg: -50 dBm tx bitrate: 300.0 MBit/s MCS 15 40Mhz short GI authorized: yes authenticated: yes preamble: long WMM/WME: yes MFP: no TDLS peer: no wireless-testing was up-to-date and in use at the time of making this comment.
Created attachment 73193 [details] Kernel config for 3.4.0-rc5-wl
Created attachment 73198 [details] Patch to fix race condition when firmware already cached This past week, a race condition was found where ieee80211 operations were started before the initialization was finished when the firmware file had previously been read by user space, and was available in cache. I think this will fix your problem. The patch has been submitted, but not yet added to wireless-testing. As soon as it is, it should be pushed to 3.4 mainline, and backported to stable.
After adding your patch the things have now quieted down for at least 4 hours. I will not comment here again if the problems do not reappear. I'll try to do some extended testing during workdays. The datarate has risen and is now 10Mbps/10Mbps for 802.11n (40Mhz). It is also an improvement. Thanks for the help!
After using the patched kernel for a week I haven't encountered any problems. Has the patch been included in the wireless-testing or mainline (rcX) kernels?
The patch has been sent to wireless-testing with the recommendation that it be included in 3.4; however, it has not yet been merged with that tree, nor the mainline one. The wireless maintainer reported that his day job kept him busy last week and that he was behind.
Created attachment 73336 [details] Crash The patched kernel ran quite well until I installed NetworkManager once again. Before I used wpa_supplicant and dhclient manually. With NetworkManager the continuous "Reason 7" deassociations appeared again and after a while I got this crash. The kernel was spiced with the race-condition patch mentioned earlier. Before this event I had pulled and compiled the 3.4.0-rc7-wl which I'm now running with which I haven't been yet able to reproduce this oops.
I can report that using any of the stock Arch Linux kernels newer than 3.4-1, or Fedora 17 with kernel 3.4.2, breaks the rtl8192cu driver. Times out trying to connect (using NetworkManager in KDE). Reverting to 3.3.8-1 solves the problem.
Created attachment 73831 [details] Patch to assure that correct USB read buffer is used This patch ensures that there is not a race condition when selecting and using the USB read buffer. I have had the patch for some time, but no real testers. If you can, please test and report.