Bug 43184
Summary: | [Intel 6300] WLAN 802.11n connections stall regularly | ||
---|---|---|---|
Product: | Networking | Reporter: | unggnu |
Component: | Wireless | Assignee: | networking_wireless (networking_wireless) |
Status: | CLOSED DUPLICATE | ||
Severity: | normal | CC: | braket, emmanuel.grumbach, garkein, ilw, johannes, kernel, linville, stf_xl, timshel, wey-yi.w.guy |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.7.1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | dmesg output showing connection stall |
Description
unggnu
2012-05-01 07:49:55 UTC
Apparently DHT is some sort of extension to the bittorrent protocol? It seems to enable a greater number of peers? So that probably increases the number of small packets on the network? But they should all be going through the same router in most cases (i.e. out to the Internet), so I don't know if that would make a difference to the aggregation algorithm. Have you tried the 11n_disable option to iwlwifi? While this would obviously, decrease your available bandwidth, I wonder if it would help with the "stalls"? Does your network recover from these stalls? Or are they permanent? We seem to be getting a lot of reports about bad 802.11n performance with iwlwifi. I wonder if there is some larger problem...? DHT is a protocol to find peers without a central host so more small UDP packages are sent around. Running Bittorrent with DHT and probably the usual high number of connections is just the easiest way for me to trigger this stall but there might be many others like being further away from the router or using 40 MHz 802.11n. The interesting part is that if I am in the room directly behind the router the stalls doesn't happen in most cases even with Bittorrent running. No, I haven't tried 11n_disable option but I am going to, thanks. The stalls are mostly permanent I think. Normally I enable/disable WLAN to get the connection back but I know at least one case were the connection didn't recover in ~half an hour although my pc was still connected to the AP. Yes, 802.11n was always an issue on Linux for me. Like using 40 MHz (two channels at once) never really worked for me without disconnects or stalls. I have only tested the 11n_disable option for two days but hadn't any stalls during this time even with Bittorrent running. Since I am only using WLAN for my Internet connection even the speed is fine with me too. But it is still sad that after so many years 802.11n still doesn't work fine on Linux. And I don't have some rare, not vendor supported hardware. Just to confirm a longer test I hadn't any stalls since using the option. So how can I help to debug this? Still an issue with 3.4 and it still does work without problems when n is disabled. same problem here since several kernel versions. stable workaround is disabling 11n, but not an acceptable solution, because most internet lines are faster than g-speed. 3.4 not working 3.5-rc2 not working but: setting CONFIG_IWLWIFI_DEBUG=y CONFIG_IWLWIFI_DEBUG_EXPERIMENTAL_UCODE=y will result in a stable connection which has (as result of my first tests) very low speed (~5mbit) Please try these patches: http://git.kernel.org/?p=linux/kernel/git/linville/wireless.git;a=commitdiff;h=d012d04e4d6312ea157b6cf19e9689af934f5aa7 http://git.kernel.org/?p=linux/kernel/git/linville/wireless.git;a=commitdiff;h=d6ee27eb13beab94056e0de52d81220058ca2297 We know they aren't fixing all the problems though and are working on more fixes. I have the same problems since upgrading to 3.4 (3.3 was fine). Also using AVM 7390 router on 5 GHz. I tried both patches, but they didn't fix the issue. It still happens. I see the same problem. Kernel 3.3 works fine but with 3.4 wireless stalls. From lspci: 03:00.0 Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35) Firmware: iwlwifi 0000:03:00.0: loaded firmware version 9.221.4.1 build 25532 I've seen this error on dmesg sometimes when the stall happens: [ 84.383705] iwlwifi 0000:03:00.0: Queue 2 stuck for 2000 ms. [ 84.383708] iwlwifi 0000:03:00.0: Current SW read_ptr 161 write_ptr 169 [ 84.383760] iwlwifi 0000:03:00.0: Current HW read_ptr 161 write_ptr 169 [ 84.383762] iwlwifi 0000:03:00.0: On demand firmware reload [ 84.388424] iwlwifi 0000:03:00.0: Failing on timeout while stopping DMA channel 1 [0x07fd0001] [ 84.388874] ieee80211 phy0: Hardware restart was requested [ 84.388939] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S [ 84.395747] iwlwifi 0000:03:00.0: Radio type=0x0-0x3-0x1 Since upgrading to 3.4.3 the issue seems to be fixed for me. Can any of the other reporters confirm this? I do agree with Daniel that 3.4.3 seems to make the situation better, but I still see connection issues that I do not see with 3.3.x The connection is very unreliable. While pinging a local machine in the network I get: 135 packets transmitted, 108 received, 20% packet loss, time 174895ms rtt min/avg/max/mdev = 1.459/90.582/1326.175/175.244 ms, pipe 2 I have been unable to get any useful debug info. I tried: echo 0x0007BBC9 > /sys/module/iwlwifi/parameters/debug and this is the output I see: [ 594.542058] iwlwifi 0000:03:00.0: I iwl_tt_handler Queueing thermal throttling work. [ 596.687066] iwlwifi 0000:03:00.0: I iwl_tt_handler Queueing thermal throttling work. [ 605.163945] iwlwifi 0000:03:00.0: I iwl_tt_handler Queueing thermal throttling work. [ 605.467038] usb usb1: usb auto-resume [ 605.467046] ehci_hcd 0000:00:1a.0: resume root hub [ 605.486652] hub 1-0:1.0: hub_resume [ 605.486669] hub 1-0:1.0: port 1: status 0507 change 0000 [ 605.486694] usb 1-1: usb auto-resume [ 605.486753] hub 1-0:1.0: state 7 ports 3 chg 0000 evt 0000 [ 605.512656] ehci_hcd 0000:00:1a.0: GetStatus port:1 status 001005 0 ACK POWER sig=se0 PE CONNECT [ 605.523600] usb 1-1: finish resume [ 605.524152] hub 1-1:1.0: hub_resume [ 605.525036] hub 1-1:1.0: port 4: status 0507 change 0000 [ 605.525515] ehci_hcd 0000:00:1a.0: reused qh ffff8802248b7780 schedule [ 605.525525] usb 1-1: link qh256-0001/ffff8802248b7780 start 1 [1/0 us] [ 605.525572] hub 1-1:1.0: state 7 ports 6 chg 0000 evt 0000 [ 605.526741] usb 1-1.4: usb auto-resume [ 605.563706] usb 1-1.4: finish resume [ 607.309948] iwlwifi 0000:03:00.0: I iwl_tt_handler Queueing thermal throttling work. [ 608.291784] usb 1-1.4: usb auto-suspend, wakeup 0 [ 609.454331] iwlwifi 0000:03:00.0: I iwl_tt_handler Queueing thermal throttling work. [ 610.297558] hub 1-1:1.0: hub_suspend [ 610.297571] usb 1-1: unlink qh256-0001/ffff8802248b7780 start 1 [1/0 us] [ 610.297969] usb 1-1: usb auto-suspend, wakeup 1 [ 612.303597] hub 1-0:1.0: hub_suspend [ 612.303608] usb usb1: bus auto-suspend, wakeup 1 [ 612.303613] ehci_hcd 0000:00:1a.0: suspend root hub [ 613.744611] iwlwifi 0000:03:00.0: I iwl_tt_handler Queueing thermal throttling work. [ 615.891293] iwlwifi 0000:03:00.0: I iwl_tt_handler Queueing thermal throttling work. [ 618.034056] iwlwifi 0000:03:00.0: I iwl_tt_handler Queueing thermal throttling work. Enabling iwlwifi debug seems to trigger the USB resume. I have nothing plugged to the USB ports. Ok, I have to correct myself. Updating to 3.4.3 did not fully solve the issue. Now, immediately after resuming from hibernate, it got stuck again: [19634.521745] PM: restore of devices complete after 831.979 msecs [19634.521856] PM: Image restored successfully. [19634.521857] Restarting tasks ... done. [19634.522666] PM: Basic memory bitmaps freed [19634.522780] video LNXVIDEO:00: Restoring backlight state [19639.112923] e1000e 0000:00:19.0: irq 41 for MSI/MSI-X [19639.214509] e1000e 0000:00:19.0: irq 41 for MSI/MSI-X [19639.214891] ADDRCONF(NETDEV_UP): eth0: link is not ready [19639.217209] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S [19639.217448] iwlwifi 0000:03:00.0: Radio type=0x0-0x3-0x1 [19639.356186] ADDRCONF(NETDEV_UP): wlan0: link is not ready [19648.086968] wlan0: authenticate with xx:xx:xx:xx:xx:xx [19648.094632] wlan0: send auth to xx:xx:xx:xx:xx:xx (try 1/3) [19648.095854] wlan0: authenticated [19648.098786] wlan0: associate with xx:xx:xx:xx:xx:xx (try 1/3) [19648.100183] wlan0: RX AssocResp from xx:xx:xx:xx:xx:xx (capab=0x411 status=0 aid=1) [19648.100185] wlan0: associated [19648.105239] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready [19648.105287] cfg80211: Calling CRDA for country: DE [19648.972293] cfg80211: Regulatory domain changed to country: DE [19648.972296] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [19648.972298] cfg80211: (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm) [19648.972299] cfg80211: (5150000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm) [19648.972301] cfg80211: (5250000 KHz - 5350000 KHz @ 40000 KHz), (N/A, 2000 mBm) [19648.972302] cfg80211: (5470000 KHz - 5725000 KHz @ 40000 KHz), (N/A, 2698 mBm) [19658.940775] wlan0: no IPv6 routers present [19663.754567] iwlwifi 0000:03:00.0: Queue 2 stuck for 2000 ms. [19663.754579] iwlwifi 0000:03:00.0: Current SW read_ptr 78 write_ptr 83 [19663.754650] iwlwifi 0000:03:00.0: Current HW read_ptr 78 write_ptr 83 [19663.754656] iwlwifi 0000:03:00.0: On demand firmware reload [19663.755195] ieee80211 phy0: Hardware restart was requested [19663.755289] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S [19663.755472] iwlwifi 0000:03:00.0: Radio type=0x0-0x3-0x1 [19704.112187] wlan0: deauthenticated from xx:xx:xx:xx:xx:xx (Reason: 6) [19704.130624] cfg80211: Calling CRDA to update world regulatory domain [19704.133712] cfg80211: World regulatory domain updated: [19704.133714] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [19704.133716] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19704.133718] cfg80211: (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) [19704.133720] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) [19704.133721] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19704.133723] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) I tried to do a bisect to find when exactly this bug was introduced. I could not really finish it, at some point it is not clear if I'm seeing the same bug or a different one that got fixed later on. I'm copying the log of the partial bisect in hopes that it helps. git bisect start # bad: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4 git bisect bad 76e10d158efb6d4516018846f60c2ab5501900bc # good: [c16fa4f2ad19908a47c63d8fa436a1178438c7e7] Linux 3.3 git bisect good c16fa4f2ad19908a47c63d8fa436a1178438c7e7 # bad: [141124c02059eee9dbc5c86ea797b1ca888e77f7] Delete all instances of asm/system.h git bisect bad 141124c02059eee9dbc5c86ea797b1ca888e77f7 # bad: [3b59bf081622b6446db77ad06c93fe23677bc533] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad 3b59bf081622b6446db77ad06c93fe23677bc533 # bad: [74dd1521d0b4f940cdd3ce7b9d988836bef589b8] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem git bisect bad 74dd1521d0b4f940cdd3ce7b9d988836bef589b8 # good: [97767a87f3be8834192dc3fc9412aaccf708d87f] be2net: Remove unused OFFSET_IN_PAGE() macro git bisect good 97767a87f3be8834192dc3fc9412aaccf708d87f # bad: [cc4bf501a237f5232df6d4aeb7b24ac0362958c8] Merge branch 'wl12xx-next' into for-linville git bisect bad cc4bf501a237f5232df6d4aeb7b24ac0362958c8 I haven't had the issue since I upgraded to 3.4.4. Seems to be fixed with that revision. Still an issue with 3.4.4 after a suspend resume: [ 851.505743] wlan0: no IPv6 routers present [ 899.550886] iwlwifi 0000:02:00.0: Queue 2 stuck for 2000 ms. [ 899.550894] iwlwifi 0000:02:00.0: Current SW read_ptr 115 write_ptr 119 [ 899.550950] iwlwifi 0000:02:00.0: Current HW read_ptr 115 write_ptr 119 [ 899.550954] iwlwifi 0000:02:00.0: On demand firmware reload [ 899.551410] ieee80211 phy0: Hardware restart was requested [ 899.551514] iwlwifi 0000:02:00.0: L1 Enabled; Disabling L0S [ 899.551710] iwlwifi 0000:02:00.0: Radio type=0x0-0x3-0x1 Created attachment 77291 [details]
dmesg output showing connection stall
I still see stalls in kernel 3.5.1
In this particular instance I saw a couple of stutters, caused by firmware restarts, followed by the connection just stopping. While no packets were going through, the interface still showed as associated.
I attach the output from dmesg.
Don't take this the wrong way but it looks like your plan is to sit this out until nobody uses this piece of hardware anymore. I mean what is missing? It is easy to reproduce after a few minutes if you connect to an AP in the 5 GHz band and it is a piece of hardware which isn't rare. And I am willing to debug it if you tell me how. And without disabling 802.11n this WLAN hardware is absolutely unusable under Linux because the connection stalls all the time. And 54 Megabit gross WLAN bandwidth is not fast enough for modern Internet connections anymore. At least have the decency to change the status from needinfo to confirmed. The connection doesn't stall as fast as usual with Kernel 3.6 nevertheless the packet drop rate is pretty high at around 15-40% while using ping and the WLAN connection stopped after some time too. So I am still stuck with 802.11a and 54 Mbit. I haven't seen any problems in a while. I'm running 3.6.8. @Jose Have you checked your packet drop rate? I haven't any stalls after over a day but the drop rate is still over 10% which is really noticeable during time critical applications and sometimes even while loading websites The high packet loss is still an issue with 3.7. 10% - 30% packet loss is normal with 802.11n. If I disable n and only use 54 Megabit there is only 0-1% packet loss from the same location. That could be pretty normal in noisy environment. Slower rates are more robust, so less packets are lost. Overall throughput is interesting, on 11n is should be better. Downloads are normally slower with n in my case because of the packet loss. And the loading of websites often takes much longer or hangs for some seconds. That's why I always disable n again after testing. Also n should use MIMO and thus be better equipped to deal with interferences. And I am using the 5GHz band in which I have the only AP nearby according to my router. So I have pinged my local router a 1000 times. Once with 802.11n disabled and once without. The kernel was 3.7.1. 802.11n disabled: 1000 packets transmitted, 881 received, 11% packet loss, time 999736ms rtt min/avg/max/mdev = 1.747/291.081/2872.046/488.365 ms, pipe 3 vs default: 1000 packets transmitted, 1000 received, 0% packet loss, time 1000385ms rtt min/avg/max/mdev = 1.242/3.110/96.074/8.992 ms I have switched the headlines. 802.11n disabled has zero packet loss and 802.11n 11%. *** This bug has been marked as a duplicate of bug 56581 *** |