Created attachment 175081 [details] dmesg output All, after experiencing bug 93431 I was happy to have a solution at hand. Yesterday I changed my router (Fritz!Box 7112 to Fritz!Box 7312) and see new problems come up (I'm not sure yet, if it's indeed related to the new router, I'll keep testing..). It might as well be related to bug 95941, though I'm not totally sure. In the beginning I usually start with a decent connection and low ping. Sometimes ping goes up to a few hundreds of ms but recovers quickly (after seconds). However, sometimes ping goes up largely and the connection stalls. (Still, I do not get 'No buffer space available' message as in bug 93431). After minutes, it recovers. Running ping during that period gives (note that dropped packages in between): 64 bytes from 192.168.1.1: icmp_seq=3523 ttl=64 time=1.75 ms 64 bytes from 192.168.1.1: icmp_seq=3524 ttl=64 time=11794 ms 64 bytes from 192.168.1.1: icmp_seq=3525 ttl=64 time=11021 ms 64 bytes from 192.168.1.1: icmp_seq=3849 ttl=64 time=1021 ms 64 bytes from 192.168.1.1: icmp_seq=3850 ttl=64 time=28.5 ms This was with kernel 4.0.0 and happens with both, 25.16.12.0 (from bug 93431) and 25.17.12.0 (from bug 95941). See attached output from echo 1 > /sys/kernel/debug/iwlwifi/0000:03:00.0/iwlmvm/fw_restart cat /sys/devices/virtual/devcoredump/devcd1/data > iwl.dump
Created attachment 175091 [details] firmware dump
It's frustrating, today I wasn't even able to connect to the wireless network, so I finally switched back to the old router (which works pretty well with the firmware from bug 93431).
I you don't CC ilw@linux.intel.com, I won't be notified about the bug... Anyway, just did that. Please let me know what is the configuration of the AP: I am especially interested to know the bandwidth. What is your distribution?
Sorry, I was not aware of that. Fortunately I can connect to the network again (with the new AP). Strange enough I am not aware of having changed any relevant setting (was it related to the bad weather we had yesterday?). The new AP uses the 802.11n standard (while the previous one did not support that). According to the manual it will automatically determine the best bandwidth based on the number of networks in the vicinity. So I'm not sure what it was actually using when the problems occurred last time; right now it shows 20MHz bandwidth. Channel selection is set to automatic mode. I'll play with the settings and report back. Is there anything specific you'd recommend to look for? I'm using ubuntustudio 14.10.
Ok - does it support 11ac by chance? I am afraid that your AP is changing to a channel that is not allowed by the regulatory database of Ubuntu which is 2 years old... They are working on updating it. You can try to update your regulatory domain to US: iw reg set US the frequency of your AP might be allowed in the database for US. Another possibility is to update the regulatory database. See the link to the bug I added here. All this is just a guess though.
Unfortunately not, even for 11n it's restricted to 2.4GHz (I hadn't chosen that AP myself, but I got it almost for free from my ISP with a new contract). I set it to fixed channel now, this one seems to work (i.e. I can establish the connection in the first place). No connection issue so far, but need to test a bit more.
News?
Created attachment 175451 [details] dmesg output So far I had no more issues connecting to the network, thanks for your hint. Still, I get very high ping rates (20-30s) at some times. From the dmesg output it seems that I do see the queue hang as well, as in bug 95941. Nevertheless, I think that I had similar effects without the queue stuck report in dmesg. I'll need some more time to see if those things are related or not. Anyway, is there anything I could provide in the meantime to help debugging the queue hang issue?
Created attachment 175551 [details] dmesg output, no queue hang Finally I also had the issue without any indication of the queue hang in dmesg (see attached). It was not as severe as I had seen before (ping reply 2-3s for a time of ~15s, then recovered to normal). I'm on kernel 4.0.0 with 25.17.12.0. dump doesn't work (probably not in this version?): cat: /sys/devices/virtual/devcoredump/devcd1/data: File not found. Anything data I can provide?
the dump is created only when you have a firmware error: a firmware crash or a queue stuck. 2 ~ 3 can happen if you get disconnected, not if the connection is kept. I'll leave this bug open for now. I will publish a new version of the firmware (-13.ucode) in the coming days. This version will be available starting 4.1. I can provide a backport based version of the driver if you are interested in testing this new version.
I get the impression that the issue is triggered only if there is some traffic happening. I have ms ping rates for a long time when I leave it running alone. Starting a file download or YouTube video usually triggers ping rates to go up (now I've seen up to 10s). Downloads will then creep at a few kb/s for some time, until it recovers at some point and I get > 500kb/s rates. I'd be happy to look into that further when -13 is out. No problem to install 4.1 to do so when it is out.
Ok. I can send a backport based driver so that you can run our latest driver / firmware without changing your base kernel. Let me know.
Please leave me a note when -13 is published. I think there's no need for a backport, I'll try with 4.1rc1.
Created attachment 175711 [details] Core10 firmware -13.ucode here you go.
Thanks, Emmanuel. I won't have time to test it right away, might take up to a week to do so. I'll keep you posted.
Sadly, I am getting this with the firmware from bug93431. ping ------ 64 bytes from 192.168.1.1: icmp_seq=12514 ttl=254 time=2241 ms 64 bytes from 192.168.1.1: icmp_seq=12515 ttl=254 time=1573 ms 64 bytes from 192.168.1.1: icmp_seq=12516 ttl=254 time=607 ms 64 bytes from 192.168.1.1: icmp_seq=12517 ttl=254 time=648 ms ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available journalctl logs firmware version ---------------------------------- May 04 18:07:20 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.16.12.0 op_mode iwlmvm Sorry, I couldn't collect the core this time. I will try to do it the next time it happens. I can also try this new firmware but I cannot run kernel 4.0. If you can give the kernel for 3.19 series then I can test it out.
You should be running 25.17.12.0. Please check with this firmware.
@Emmanuel May 05 00:01:55 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.17.12.0 op_mode iwlmvm 64 bytes from 192.168.1.1: icmp_seq=1846 ttl=254 time=1094 ms 64 bytes from 192.168.1.1: icmp_seq=1847 ttl=254 time=4517 ms 64 bytes from 192.168.1.1: icmp_seq=1848 ttl=254 time=3867 ms 64 bytes from 192.168.1.1: icmp_seq=1849 ttl=254 time=3109 ms 64 bytes from 192.168.1.1: icmp_seq=1851 ttl=254 time=1967 ms 64 bytes from 192.168.1.1: icmp_seq=1852 ttl=254 time=4545 ms 64 bytes from 192.168.1.1: icmp_seq=1853 ttl=254 time=6021 ms 64 bytes from 192.168.1.1: icmp_seq=1854 ttl=254 time=5742 ms 64 bytes from 192.168.1.1: icmp_seq=1857 ttl=254 time=3461 ms 64 bytes from 192.168.1.1: icmp_seq=1858 ttl=254 time=3000 ms 64 bytes from 192.168.1.1: icmp_seq=1859 ttl=254 time=2073 ms 64 bytes from 192.168.1.1: icmp_seq=1860 ttl=254 time=1666 ms 64 bytes from 192.168.1.1: icmp_seq=1861 ttl=254 time=1465 ms 64 bytes from 192.168.1.1: icmp_seq=1862 ttl=254 time=1107 ms Don't know why but on some days the problem is more pronounced than others.
Firmware ported a fix to the -12.ucode: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7260-12.ucode I'd be grateful if you could test it. Thanks.
Testing this: May 05 18:41:39 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.18.12.0 op_mode iwlmvm I'll let you know if something goes wrong.
I think its worse: Ping ----- 64 bytes from 192.168.1.1: icmp_seq=106 ttl=254 time=1.47 ms 64 bytes from 192.168.1.1: icmp_seq=107 ttl=254 time=13.3 ms 64 bytes from 192.168.1.1: icmp_seq=108 ttl=254 time=7.13 ms 64 bytes from 192.168.1.1: icmp_seq=109 ttl=254 time=564 ms 64 bytes from 192.168.1.1: icmp_seq=110 ttl=254 time=929 ms 64 bytes from 192.168.1.1: icmp_seq=111 ttl=254 time=447 ms 64 bytes from 192.168.1.1: icmp_seq=112 ttl=254 time=235 ms 64 bytes from 192.168.1.1: icmp_seq=113 ttl=254 time=383 ms 64 bytes from 192.168.1.1: icmp_seq=114 ttl=254 time=1011 ms 64 bytes from 192.168.1.1: icmp_seq=115 ttl=254 time=35.2 ms 64 bytes from 192.168.1.1: icmp_seq=116 ttl=254 time=1224 ms 64 bytes from 192.168.1.1: icmp_seq=117 ttl=254 time=1682 ms 64 bytes from 192.168.1.1: icmp_seq=118 ttl=254 time=2055 ms 64 bytes from 192.168.1.1: icmp_seq=119 ttl=254 time=1810 ms 64 bytes from 192.168.1.1: icmp_seq=120 ttl=254 time=1617 ms 64 bytes from 192.168.1.1: icmp_seq=121 ttl=254 time=1211 ms 64 bytes from 192.168.1.1: icmp_seq=122 ttl=254 time=1437 ms See the instant jump in latency on opening a youtube video. Firmware Version ------------------ Cross checking again: May 05 19:19:18 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.18.12.0 op_mode iwlmvm Attaching dmesg and dump
Created attachment 175871 [details] Firmware dump for 25.18.12.0
Created attachment 175881 [details] dmesg output for 25.18.12.0
this was a production firmware and hence the dump is useless. Thanks anyway.
@Emmanuel I can try out debug firmwares if you can provide me any. By the way, 25.18.12.0 is so bad that I am now getting: ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available Reverting back to 25.17.12.0.
ok - thanks... let me know if reverting back to 25.17.12.0 helps...
No respite. Although I haven't hit the buffer space error in 25.17.12.0 but that might be accidental. PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=254 time=14303 ms 64 bytes from 192.168.1.1: icmp_seq=2 ttl=254 time=16518 ms 64 bytes from 192.168.1.1: icmp_seq=3 ttl=254 time=16148 ms 64 bytes from 192.168.1.1: icmp_seq=4 ttl=254 time=17028 ms 64 bytes from 192.168.1.1: icmp_seq=5 ttl=254 time=17185 ms 64 bytes from 192.168.1.1: icmp_seq=6 ttl=254 time=18344 ms I'll wait till you guys have something to say more about it. This happens to be the only card that came with my Thinkpad and I'm facing a lot of trouble in my day to day work(VPN etc.) I remember specifically customizing the TP to get the Intel card instead of the inbuilt TP one :)
Created attachment 175921 [details] 25.12.17.0 with uSniffer Please create a dump with the firmware attached. I will forward the information to the firmware team. Thank you.
Thanks. Here you go journalctl ------------ May 06 00:19:57 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.17.12.0 op_mode iwlmvm Ping instance where it suddenly deteriorated ---------------------------------------------- 64 bytes from 192.168.1.1: icmp_seq=113 ttl=254 time=11.4 ms 64 bytes from 192.168.1.1: icmp_seq=114 ttl=254 time=2.13 ms 64 bytes from 192.168.1.1: icmp_seq=115 ttl=254 time=4.68 ms 64 bytes from 192.168.1.1: icmp_seq=116 ttl=254 time=4.39 ms 64 bytes from 192.168.1.1: icmp_seq=117 ttl=254 time=3.85 ms 64 bytes from 192.168.1.1: icmp_seq=118 ttl=254 time=6.68 ms 64 bytes from 192.168.1.1: icmp_seq=119 ttl=254 time=923 ms 64 bytes from 192.168.1.1: icmp_seq=121 ttl=254 time=5632 ms 64 bytes from 192.168.1.1: icmp_seq=122 ttl=254 time=7788 ms 64 bytes from 192.168.1.1: icmp_seq=123 ttl=254 time=6788 ms 64 bytes from 192.168.1.1: icmp_seq=124 ttl=254 time=5951 ms 64 bytes from 192.168.1.1: icmp_seq=125 ttl=254 time=4975 ms 64 bytes from 192.168.1.1: icmp_seq=126 ttl=254 time=4383 ms 64 bytes from 192.168.1.1: icmp_seq=129 ttl=254 time=5336 ms
Created attachment 175931 [details] uSniffer iwl dump for 25.12.17.0
Created attachment 175941 [details] Corresponding dmesg output for above.
data has been transferred to the firmware team.
Created attachment 176031 [details] dmesg with -13 firmware Emmanuel, this is what I get with -13 firmware and 4.1rc2. Won't connect to the network.
Wow.... it works for me... I am not involve in the firmware, I am just delivering it...
Created attachment 176041 [details] Core10 firmware -13.ucode here is a new version of the -13.ucode.... Let me know.
ah... I see You are missing a patch that is on its way upstream... You can pull iwlwifi-fixes.git or wait for -rc3. I can also send a backport version.
I made a discovery. I listen to music through a bluetooth speaker. So the speaker is invariably on when I run a youtube video. Today it was off... and the moment I turned it on, see the jump in latency... 64 bytes from 192.168.1.1: icmp_seq=2729 ttl=254 time=1.75 ms 64 bytes from 192.168.1.1: icmp_seq=2730 ttl=254 time=3.98 ms 64 bytes from 192.168.1.1: icmp_seq=2731 ttl=254 time=5.22 ms 64 bytes from 192.168.1.1: icmp_seq=2732 ttl=254 time=1.74 ms 64 bytes from 192.168.1.1: icmp_seq=2733 ttl=254 time=443 ms 64 bytes from 192.168.1.1: icmp_seq=2734 ttl=254 time=157 ms 64 bytes from 192.168.1.1: icmp_seq=2735 ttl=254 time=2163 ms 64 bytes from 192.168.1.1: icmp_seq=2736 ttl=254 time=2463 ms 64 bytes from 192.168.1.1: icmp_seq=2737 ttl=254 time=2198 ms 64 bytes from 192.168.1.1: icmp_seq=2738 ttl=254 time=1228 ms 64 bytes from 192.168.1.1: icmp_seq=2739 ttl=254 time=1008 ms 64 bytes from 192.168.1.1: icmp_seq=2740 ttl=254 time=187 ms Now with this latency, the video was about to get stuck when I immediately switched it off. The latency took some time to recover but when it did, it did not climb back again 64 bytes from 192.168.1.1: icmp_seq=2802 ttl=254 time=3154 ms 64 bytes from 192.168.1.1: icmp_seq=2803 ttl=254 time=2950 ms 64 bytes from 192.168.1.1: icmp_seq=2804 ttl=254 time=2908 ms 64 bytes from 192.168.1.1: icmp_seq=2805 ttl=254 time=2287 ms 64 bytes from 192.168.1.1: icmp_seq=2806 ttl=254 time=1335 ms 64 bytes from 192.168.1.1: icmp_seq=2807 ttl=254 time=475 ms 64 bytes from 192.168.1.1: icmp_seq=2808 ttl=254 time=11.4 ms 64 bytes from 192.168.1.1: icmp_seq=2809 ttl=254 time=2.19 ms 64 bytes from 192.168.1.1: icmp_seq=2810 ttl=254 time=3.35 ms 64 bytes from 192.168.1.1: icmp_seq=2811 ttl=254 time=1.71 ms 64 bytes from 192.168.1.1: icmp_seq=2812 ttl=254 time=3.62 ms 64 bytes from 192.168.1.1: icmp_seq=2813 ttl=254 time=1.21 ms I again switched on the speaker and the latency jumped yet again 64 bytes from 192.168.1.1: icmp_seq=2980 ttl=254 time=5.37 ms 64 bytes from 192.168.1.1: icmp_seq=2981 ttl=254 time=1.06 ms 64 bytes from 192.168.1.1: icmp_seq=2982 ttl=254 time=5.96 ms 64 bytes from 192.168.1.1: icmp_seq=2983 ttl=254 time=1.45 ms 64 bytes from 192.168.1.1: icmp_seq=2984 ttl=254 time=39.4 ms 64 bytes from 192.168.1.1: icmp_seq=2985 ttl=254 time=906 ms 64 bytes from 192.168.1.1: icmp_seq=2986 ttl=254 time=869 ms 64 bytes from 192.168.1.1: icmp_seq=2987 ttl=254 time=1271 ms 64 bytes from 192.168.1.1: icmp_seq=2988 ttl=254 time=1554 ms I repeated the process many times with repeatable results. Maybe the kernel has a role here? @Stefan : Do you by chance use a bluetooth device too? @Emmanuel: Let me know what you think about it. I can collect any logs that you might want.
Using bluetooth has a major impact on WiFi. Especially if you are connected in 2.4GHz. I always assume that users don't use bluetooth because they typically tell me when they do use bluetooth. Not in this case :) Having a high latency when bluetooth is normal, but I don't know if such a high latency is acceptable. I'd have to check. I'd love to hear from other users if the bug they are seeing is also when they use bluetooth. Thanks.
Hmmm... I had the faintest clue that they could be interfering. I shifted to Channel 13 in WiFi, towards the fag end of the ISM band and things improved dramatically. Now the latencies do increase but only by a few hundreds. Went to the seconds range but only for 1-2 pings. I've played three 1080p videos back to back now with no videos getting stuck. Signs of a victory... Hoping it holds for the next couple of days :) Thanks for the pointer!
Ok - but there is clearly a bug. Running 1080p with BT A2DP is definitely something that should be working. The problem is that debugging these issues is a pain because I have no clue about bluetooth's version, known issues etc... The BT firmware is under /lib/firmware/intel/ I can see that a few BT firmwares were updated in February. I'd try to take the latest linux-firwmare.git and copy the content of the intel/ dir to your /lib/firmware. Then you'll need to reboot.
@Emmanuel Did as you told. Copied the new firmware in my intel/ directory. Also, reverted back to my old channel. Things are working correctly as of now. I'll watch it for a couple of days in case its a fluke.
That's the problem with BT Coex issues, they can be either in BT or in WiFi :)
*sigh* Hit it again: 64 bytes from 192.168.1.1: icmp_seq=83 ttl=254 time=1311 ms 64 bytes from 192.168.1.1: icmp_seq=84 ttl=254 time=1914 ms 64 bytes from 192.168.1.1: icmp_seq=85 ttl=254 time=3088 ms 64 bytes from 192.168.1.1: icmp_seq=88 ttl=254 time=2440 ms 64 bytes from 192.168.1.1: icmp_seq=89 ttl=254 time=2357 ms 64 bytes from 192.168.1.1: icmp_seq=90 ttl=254 time=2881 ms 64 bytes from 192.168.1.1: icmp_seq=91 ttl=254 time=2797 ms 64 bytes from 192.168.1.1: icmp_seq=92 ttl=254 time=2858 ms 64 bytes from 192.168.1.1: icmp_seq=93 ttl=254 time=2939 ms 64 bytes from 192.168.1.1: icmp_seq=94 ttl=254 time=3488 ms 64 bytes from 192.168.1.1: icmp_seq=95 ttl=254 time=3196 ms 64 bytes from 192.168.1.1: icmp_seq=96 ttl=254 time=2938 ms 64 bytes from 192.168.1.1: icmp_seq=97 ttl=254 time=2321 ms 64 bytes from 192.168.1.1: icmp_seq=98 ttl=254 time=1475 ms The latencies have definitely improved with the new intel/* firmwares, as in I dont see them going beyond 2-3s ... and the 1080p when stuck recovers faster. Back to my channel changing again. Can we pull in some bluetooth firmware guys in this bug?
I don't know about pulling a BT guy, but we can collect logs from the firmware that can allow to debug this. For that, I'll need to take a customized firmware from the firmware team, but that specific area is in transition there... *sigh*....
Looks like there are different issue going on. I'm not using bluetooth much. In fact, right now it's disabled - at least in blueman-applet, I don't know if that actually disables some hardware part. @Comment 36: I using ubuntu packages to install the kernel updates, I haven't set up the environment to compile custom kernel. Say, I'd rather like to wait for rc3 to come out, then test firmware -13.
@Stefan: Thanks for your input. It doesn't matter if your bluetooth is enabled in hardware or not. I noticed that my trouble begins when there is bluetooth *traffic* between my laptop and speaker. Since you do not have any device attached, so its not exactly my setup as far as BT is concerned. @Emmanuel: Will wait for your guidance on how to proceed about this bluetooth issue. Maybe, I'm hitting Stefan's issue as well this new issue.
AAAAARG! and help!! I used the firmware you suggest with kernel from iwlwifif-fixes. I got great speeds etc for aaaages, then after suspend it was back to flaky slow transfers, high ping times etc. BUT worst of all my machine now takes an age to boot with the following error (on _all_ kernels and usb-live iso) usb 1-6 device read descriptor read/64, error -110 /devices/pci0000:00/0000:00:14.0/usb1 is taking to long AND I have no bluetooth on any kernel, which I am using and is essential for getting the thesis I am working on done (hair falling out stress etc). I have tried all -12 firmwares from 143 up, no change. Bluetooth seems dead!!! Attached is a dump with dmesg as well for when the connection went from flaky to fail. Is there a work around to get the bt back up???
Created attachment 176161 [details] iwl dump and dmesg snip
Crisis averted somehow. About half an hour after booting with an older iwlwifi-7260-12 firmware bluetooth mysteriously starts again and boot is fast without errors. Weird as, but good 8)
@sheksis Please check the module parameter bt_coex_active. Please open a new bug mentioning the bt use case Thanks
Created attachment 176181 [details] new bluetooth firmware New bluetooth firmware has been submitted to linux-firmware.git but not accepted yet. Attached the same file. Try this to see if it can improve the quality. Copy this file to /lib/firmware/intel and restart the system (cold reboot)
64 bytes from 192.168.1.1: icmp_seq=56 ttl=254 time=1292 ms 64 bytes from 192.168.1.1: icmp_seq=57 ttl=254 time=760 ms 64 bytes from 192.168.1.1: icmp_seq=58 ttl=254 time=2056 ms 64 bytes from 192.168.1.1: icmp_seq=59 ttl=254 time=1351 ms 64 bytes from 192.168.1.1: icmp_seq=60 ttl=254 time=1405 ms 64 bytes from 192.168.1.1: icmp_seq=61 ttl=254 time=1266 ms 64 bytes from 192.168.1.1: icmp_seq=62 ttl=254 time=1863 ms 64 bytes from 192.168.1.1: icmp_seq=63 ttl=254 time=1271 ms 64 bytes from 192.168.1.1: icmp_seq=64 ttl=254 time=858 ms 64 bytes from 192.168.1.1: icmp_seq=65 ttl=254 time=504 ms 64 bytes from 192.168.1.1: icmp_seq=66 ttl=254 time=985 ms 64 bytes from 192.168.1.1: icmp_seq=67 ttl=254 time=1259 ms 64 bytes from 192.168.1.1: icmp_seq=68 ttl=254 time=590 ms 64 bytes from 192.168.1.1: icmp_seq=69 ttl=254 time=351 ms 64 bytes from 192.168.1.1: icmp_seq=70 ttl=254 time=810 ms 64 bytes from 192.168.1.1: icmp_seq=71 ttl=254 time=276 ms 64 bytes from 192.168.1.1: icmp_seq=72 ttl=254 time=1130 ms 64 bytes from 192.168.1.1: icmp_seq=73 ttl=254 time=763 ms 64 bytes from 192.168.1.1: icmp_seq=74 ttl=254 time=411 ms 64 bytes from 192.168.1.1: icmp_seq=75 ttl=254 time=1088 ms 64 bytes from 192.168.1.1: icmp_seq=76 ttl=254 time=1468 ms 64 bytes from 192.168.1.1: icmp_seq=77 ttl=254 time=1806 ms 64 bytes from 192.168.1.1: icmp_seq=78 ttl=254 time=1632 ms 64 bytes from 192.168.1.1: icmp_seq=79 ttl=254 time=1153 ms The 1080p video ultimately got stuck with these latencies. @Ted Let me know what logs I can provide you. Filed bug97921 for this issue as Emmanuel had asked. Lets shift the bluetooth investigations there.
@Stefan: I had a feedback from the firmware team. The link looks *very* bad. Both sides are using the lowest rate and even with this lowest rate, there are *tons* of failures. This is really unusual. Since you are seeing that with the new AP only, can you please check that the firmware of the AP is up to date. The beacon timing of the AP is looking very bad, the AP is sending beacons at wrong timings which means that we will miss the beacons when are power saving. The timing function the of AP also looks wrong. The timer of the AP seems to be lying. You can try to disable power save[1], but I don't see this as the source of all your problems. The AP is really behaving fishy. [1] sudo iw wlan0 set power_save off
another thing. Please try to create a dump immediately after a ping latency of 10ms and above. This might shed more light. If you can, we'd appreciate if you could create a dump with the first AP (the good one) for comparison. Also - are the APs at the same place? thanks.
yet another thing :) please send the output of iw wlan0 link. Thanks.
Created attachment 176441 [details] dmesg with -13 firmware, -rc3 Emmanuel, I tried -13 firmware with -rc3 (Comment 36), but I still get quite some errors in dmesg and no connection. Did the patch make it to -rc3? --- Anyway thanks for the feedback. I didn't expect the AP to be the root of the issues, I only had good experience with the manufacturer so far. It is using the latest AP Firmware, actually. For taking a new dump, should I wait until I have -13 up and running or get the dump with -12? Does it matter whether I create the dump with power save on or off? And finally as requested: iw wlan0 link SSID: Netzbox Lu freq: 2437 RX: 2054577 bytes (5636 packets) TX: 615134 bytes (4363 packets) signal: -73 dBm tx bitrate: 48.0 MBit/s bss flags: short-preamble short-slot-time dtim period: 1 beacon int: 100 Note that the bitrate it fluctuating heavily, during times of high ping latency I see signal: -68 dBm tx bitrate: 11.0 MBit/s and signal: -78 dBm tx bitrate: 1.0 MBit/s
My fixes didn't make it into -rc3. They are in net.git right now... Please use -12.ucode to create the dump after > 10s latency. Please create it with power save disabled. Your beacon internal is 100. Good. That's reasonable. Your bitrate is very low when the pings get very slow. This is exactly what we saw in the dump you initially created. Do you have another device that works well with this AP? Do you have another Linux system that could record data on the air? (another Intel device on a Linux machine can do that).
Created attachment 176491 [details] firmware dump, new and old AP Did you mean 10s or 10ms latency? See attached the dump with 25.17.12.0 for old and new AP. For the new AP latency was ~5s when I did the fw_restart. New old AP is more stable, ping latency was at ~100ms. Both APs where locate in exactly the same place. For comparison: Old AP: iw wlan0 link SSID: Netzbox Lu freq: 2427 RX: 37395346 bytes (29417 packets) TX: 2088375 bytes (19607 packets) signal: -61 dBm tx bitrate: 36.0 MBit/s bss flags: short-slot-time dtim period: 3 beacon int: 100
Besides the 7260 machine, I'm having another Windows computer here and three Android mobile devices. Even with the new AP there are no major issues obvious on the other devices. I have an old linux machine around or could use a live USB to boot the Windows machine to Linux and set up wireshark there. Need to check the wireless cards, though.
Okay, today even with bluetooth off, I got the "no buffer" error. Thats the most degenerate case of high latencies I guess. After some time, I also get $ ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. From 192.168.1.4 icmp_seq=1 Destination Host Unreachable From 192.168.1.4 icmp_seq=2 Destination Host Unreachable From 192.168.1.4 icmp_seq=3 Destination Host Unreachable However, the other devices connected to my AP keep functioning correctly. I was able to ping the AP through a terminal in my Android phone.
Also, the above was when I wasn't even playing any videos. So the traffic was minuscule.
Were you able to create a firmware dump what that happens? Please record tracing. I'll look at the tracing and send the firmware dump to the firmware people. I'd like to know also if restarting the firmware (with the debugfs hook) helps. Thanks.
No, I wasn't able to create the dump. This time I've set up the uSniffer firmware. Once the issue gets hit, I'll update this bug.
@Emmanuel So I have a question, when recording the trace, the wiki asks has several switches listed too. Do you want any specific switches... or should I just do $sudo trace-cmd record -e iwlwifi
That should be enough.
Emmanuel, to create attachment 176491 [details] I didn't have tracing running - do I need to recreate those while tracing is active?
No - you don't.
4.1-rc4 can use -13.ucode
Obviously that's another story, but rc4 doesn't boot here at all...
please ignore the -13.ucode in this bug and take the -13.ucode from: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/tree/ It contains more bug fixes.
Emmanuel, another interesting observation: To do some testing I've been switching a lot between old and new AP. Both are configured to provide the same SSID, so I never run them at the same time. It happens frequently, that after switching the AP, the card doesn't find the network any more. It will list other networks in the vicinity, but no my own one. Reloading the iwlwifi module fixes this. No problem with other devices. (This is still with 25.17.12.0 on 4.0. I'll test more as soon as I get 4.1 running)
That's a separate issue. Maybe due to scan offload. Not sure.
Unfortunately, it's still there: 64 bytes from 192.168.1.1: icmp_seq=54 ttl=64 time=3.84 ms 64 bytes from 192.168.1.1: icmp_seq=55 ttl=64 time=810 ms 64 bytes from 192.168.1.1: icmp_seq=56 ttl=64 time=683 ms 64 bytes from 192.168.1.1: icmp_seq=57 ttl=64 time=384 ms 64 bytes from 192.168.1.1: icmp_seq=58 ttl=64 time=592 ms 64 bytes from 192.168.1.1: icmp_seq=59 ttl=64 time=3209 ms with 25.27.13.0 on 4.1-rc5.
-13.ucode is now officially published. You can get it here: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/ It includes a few more fixes than the versions I attached to this bug. Note that the version number might be the same. I encourage everybody to move to the version from the git repository.
Created attachment 178531 [details] Core10 firmware -13.ucode with uSniffer Can you please create a firmware dump with the firmware attached? This is a debug version of the same firmware in the git repository. You'll need to create the dump immediately after the ping time is high. When someone creates a dump, please let me know what was the channel used and the bandwidth. Thank you.
This is 7265, I'd need 7260. I assume they are not compatible?
Created attachment 178581 [details] Core10 firmware -13.ucode with uSniffer Here you go.
Created attachment 178611 [details] firmware dump with -13 firmware Thanks. Attached find the firmware dump at ~5000ms ping time, kernel 4.1-rc6, wifi channel 6, 2437 MHz. Is there a way to get the bandwidth info from the driver? My AP settings are somewhat cryptic, I'd assume that it is using 20MHz only, but I'm not sure. It is set to b+g mode (802.11n disabled).
11n disabled means it cant be 4mhz
Today I tried with your latest attached -13 firmware and today's iwlwifif-fixes kernel. Moving bulk data with rsync or whatever whilst pinging google shows that all is good. Transfer speeds were not amazing, 3-5Mb/s. But, there is always a but, bluetooth does not work. The device is detected but does not allow any external peripherals to connect. What info can I give you please? Connected to 00:22:3f:e5:ab:90 (on wlan0) SSID: TempleThumper freq: 2462 RX: 3315480255 bytes (2145912 packets) TX: 28941464 bytes (285533 packets) signal: -48 dBm tx bitrate: 117.0 MBit/s MCS 14 bss flags: short-preamble short-slot-time dtim period: 3 beacon int: 100 === dmesg === [ 257.850454] Bluetooth: HIDP (Human Interface Emulation) ver 1.2 [ 257.850460] Bluetooth: HIDP socket layer initialized [ 283.640849] Bluetooth: Unexpected continuation frame (len 1) [ 411.330340] Bluetooth: hci0 command 0x0804 tx timeout [ 495.561361] Bluetooth: hci0 command 0x041f tx timeout [ 497.567444] Bluetooth: hci0 command 0x0406 tx timeout
BT is not tracked in this bug. Thank you for report.
Unfortunately no improvement with kernel 4.1.1 and 25.30.13.0. However, I discovered today that the situation is much better on channel 1. I've been trying with channel 3, 6, 11, and 13 - and had the issue on all those. Now I switched to channel 1 and it seems to be much better. I had not expected this, since scanning for other networks shows that channel 1 is pretty crowded (8 other networks on that channel, plus another one on channel 3). Any clues already how to fix this? Btw, I also submitted a new bug 100961 regarding the issue described in comment 71.
(In reply to Stefan Soeffing from comment #82) > Unfortunately no improvement with kernel 4.1.1 and 25.30.13.0. > > However, I discovered today that the situation is much better on channel 1. > I've been trying with channel 3, 6, 11, and 13 - and had the issue on all > those. Now I switched to channel 1 and it seems to be much better. > > I had not expected this, since scanning for other networks shows that > channel 1 is pretty crowded (8 other networks on that channel, plus another > one on channel 3). > > Any clues already how to fix this? > Please create a dump on Channel 1 and another one on "bad" channel. I will forward the data to the firmware team. > > Btw, I also submitted a new bug 100961 regarding the issue described in > comment 71. I just added Intel to that bug. Please add ilw@linux.intel.com on any Intel WiFi Linux related issues.
I will create one with channel 1. Is the dump from comment 78 enough for comparison?
(In reply to Stefan Soeffing from comment #84) > I will create one with channel 1. Is the dump from comment 78 enough for > comparison? Probably yes.
Created attachment 182641 [details] firmware dump, ch 1 This is a firmware dump on channel 1, firmware 25.30.13.0. To be honest, I'm not completely sure if that channel is really "better". It's more of a feeling than hard evidence. I can watch youtube videos with (probably) less interruptions. But still ping times are in the few seconds range while watching.
In fact, today I got evidence that my last observations must have been not more than luck. I haven't changed anything in the router configuration, but today I have severe problems with the connection again. Ping times at ~10s, taking almost a minute to recover. Emmanuel, is someone looking into that? Any more information I could provide?
Created attachment 187011 [details] dmesg output I seem to be having the same problem. Fedora 22 iwl7260-firmware-25.17.12.0-53.fc22.noarch # cat /etc/modprobe.d/iwlwifi.conf options iwlwifi 11n_disable=1 dmesg output attached. Any other information I could provide, or any experiments I could try? Thanks!
I'm sorry to hear about all these troubles. Emmanuel is on vacation now, so I'm backing him up for now. I'll ping our firmware team to check whether they have any news on this issue.
This is still on our firmware team's hands, but unfortunately we don't have much progress to report yet. I'll keep you updated.
New versions of the firmware are now available. There is at least one issue that have been identified in a few of the logs that I know that is *not* fixed. So that I can't promise it will help unfortunately. But it is worth trying... You can find them here: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/ I suggest to test -13.ucode and above. -13 is pretty much end of life. I suggest to give it a try since it did get a few updates since the version I attached to this bug. -15 and -16 are far more interesting to test. -17.ucode is still under stabilization. Thanks.
Created attachment 190231 [details] dmesg output I've been experiencing this issue on my asus UX303L on Arch Linux, kernel 4.2.3 - every time I download a few hundred kilobytes, everything hangs for 10 seconds. At first, then I booted a system from a different computer on there off an external drive (same kernel) and experienced this issue; then I rebooted on the main system and the issue was showing itself on *both* systems. I tried -13, -14, -15 and none of them showed any difference. I also tried -16 and -17 but neither of them could load, couldn't figure out why. Attached is a dmesg from a recent boot showcasing the issue multiple times. Let me know if there's more I can provide. lspci: 02:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
-16 and -17 can't be loaded from 4.2.3. You'll have to use the our backport tree for that. You can find it here: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/backport-iwlwifi.git/
Created attachment 190291 [details] dmesg with drive plugged in/out I just realized something. On my main system, this issue happens *only* when I have my SSD plugged in to my USB3 port. Maybe this is a power issue of sorts? New dmesg attached, including the drive being plugged in and out much later and the hang happening a few times. Does this help?
This is still 15.ucode. You may have put 16.ucode on your file system, but the driver is still using 15.ucode. You need to update the driver.
Yes, I'm aware - I'm stuck on kernel 4.2.3 right now, and compiling the backports isn't something I have time to look into just yet. But I can reliably stop and reproduce the issue by plugging my ssd in or out of the USB socket.
Very strange.... I don't really buy the power theory. Because the issue seems to be a radio problem. I would lean towards an interference between the hard disk operation and the WiFi radio. But that is also a wild crazy guess.
When you'll update your driver, I'll help you collecting the firmware logs.
I initiated a 2.6GB download for something. I had several instances where the speed fell off the cliff. Because: 64 bytes from 192.168.1.1: icmp_seq=449 ttl=255 time=2426 ms 64 bytes from 192.168.1.1: icmp_seq=450 ttl=255 time=2879 ms 64 bytes from 192.168.1.1: icmp_seq=451 ttl=255 time=2835 ms 64 bytes from 192.168.1.1: icmp_seq=452 ttl=255 time=3990 ms 64 bytes from 192.168.1.1: icmp_seq=453 ttl=255 time=4479 ms 64 bytes from 192.168.1.1: icmp_seq=454 ttl=255 time=5968 ms 64 bytes from 192.168.1.1: icmp_seq=455 ttl=255 time=5159 ms 64 bytes from 192.168.1.1: icmp_seq=456 ttl=255 time=4496 ms 64 bytes from 192.168.1.1: icmp_seq=457 ttl=255 time=7871 ms 64 bytes from 192.168.1.1: icmp_seq=458 ttl=255 time=7060 ms 64 bytes from 192.168.1.1: icmp_seq=459 ttl=255 time=6145 ms 64 bytes from 192.168.1.1: icmp_seq=460 ttl=255 time=8374 ms 64 bytes from 192.168.1.1: icmp_seq=461 ttl=255 time=7375 ms 64 bytes from 192.168.1.1: icmp_seq=462 ttl=255 time=6376 ms 64 bytes from 192.168.1.1: icmp_seq=463 ttl=255 time=5380 ms 64 bytes from 192.168.1.1: icmp_seq=464 ttl=255 time=4382 ms 64 bytes from 192.168.1.1: icmp_seq=465 ttl=255 time=3387 ms 64 bytes from 192.168.1.1: icmp_seq=466 ttl=255 time=2395 ms 64 bytes from 192.168.1.1: icmp_seq=467 ttl=255 time=1405 ms 64 bytes from 192.168.1.1: icmp_seq=468 ttl=255 time=419 ms 64 bytes from 192.168.1.1: icmp_seq=469 ttl=255 time=236 ms Since bluetooth isnt turned ON, so it is definitely this bug. Heres the dmesg to begin with: $ journalctl -b0 --no-pager | grep iwlwifi Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 17.228510.0 op_mode iwlmvm Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144 Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0 wlp4s0: renamed from wlan0 Oct 17 08:26:55 shatrupa NetworkManager[931]: <info> rfkill2: found WiFi radio killswitch (at /sys/devices/pci0000:00/0000:00:1c.1/0000:04:00.0/ieee80211/phy0/rfkill2) (driver iwlwifi) Oct 17 08:26:55 shatrupa NetworkManager[931]: <info> (wlp4s0): new 802.11 WiFi device (carrier: UNKNOWN, driver: 'iwlwifi', ifindex: 3) Oct 17 08:26:55 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled Oct 17 08:26:55 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled Oct 17 08:26:55 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled Oct 17 08:26:55 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled Oct 17 08:26:58 shatrupa kernel: iwlwifi 0000:04:00.0 wlp4s0: disabling HT/VHT due to WEP/TKIP use Oct 17 08:26:58 shatrupa kernel: iwlwifi 0000:04:00.0 wlp4s0: disabling HT as WMM/QoS is not supported by the AP Oct 17 08:26:58 shatrupa kernel: iwlwifi 0000:04:00.0 wlp4s0: disabling VHT as WMM/QoS is not supported by the AP The kernel and firmware are the latest ones based on the guidance in bug97921.
Emmanuel, I just tried the new versions. I'm running now with kernel 4.3.0-040300rc6-generic iwlwifi 0000:03:00.0: loaded firmware version 17.228510.0 op_mode iwlmvm (bluetooth turned off) Still I get at times: 64 bytes from 192.168.1.1: icmp_seq=90 ttl=64 time=3927 ms 64 bytes from 192.168.1.1: icmp_seq=91 ttl=64 time=2950 ms 64 bytes from 192.168.1.1: icmp_seq=92 ttl=64 time=2830 ms ... Anything else I should look for? Btw, I didn't install the bluetooth firmware update from bug 97921, does this make a difference? What can I provide for further debugging, do you need another firmware dump?
Btw, I also tried with -16 firmware - same thing there as well.
Created attachment 191431 [details] Core14 FW with uSniffer Can you please record a firmware dump with this firmware? Please stop the collection as soon as you have long ping latencies. Thanks.
Created attachment 191761 [details] firmware dump Hi Emmanuel, please find attached a new dump. While watching a video on youtube, I got 64 bytes from 192.168.1.1: icmp_seq=2040 ttl=64 time=22945 ms 64 bytes from 192.168.1.1: icmp_seq=2041 ttl=64 time=22864 ms ... Thanks.
Hi Emmanuel, thanks to your hint I just read the related report, bug 103531. The problem looks similar (though I get slow connection only, no firmware crashes). Do you have a clue whether both issues have the same root cause? I'd be happy to assist in providing more feedback if that helps.
I also occasionally face this use, say once in 3 days. and I am running the following Kernel ------- $ uname -r 4.4.0-0.rc6.git1.1.fc24.x86_64 Logs ------ Dec 30 09:08:04 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 16.242414.0 op_mode iwlmvm
Created attachment 206391 [details] Core14 FW with more debug probes Our firmware team introduced more debug prints to try to understand what is going on. Can you please run with the firmware attached and produce a dump once again? Thank you.
Emmanuel, thanks, I will do that, give me a few days. To be honest, I'm somewhat disappointed with that device - that's why I ordered a 3160 card yesterday as a replacement. Hopefully this gives a more stable connection.
Created attachment 206721 [details] firmware dump Emmanuel, I created a new dump using echo 1 > /sys/kernel/debug/iwlwifi/0000:03:00.0/iwlmvm/fw_restart cat /sys/devices/virtual/devcoredump/devcd1/data > iwl.dump while I had high latency (ping time ~ 10s loading a youtube video) This is with kernel 4.5.0-040500rc6-generic and iwlwifi 0000:03:00.0: loaded firmware version 17.295852.0 op_mode iwlmvm Did I miss something? Please let me know.
Excellent - thanks ... again...
Stefan, I forwarded your data to our firmware team for analysis. Hezi Hezkiyahu will share his findings in bugzilla.
Hello Looking at the logs, I don’t see any evidence for a bug, but I do see that there is some link issue, meaning the signal your wifi is receiving is weak, I also suspect a problem in one of your antennas and/or interference from other routers in the area. To pin-point the issue, we need to check for a few things, please try to do the following: 1.How many brick/concrete walls are there between the router and your device? Can you move it? Moving the router a little may resolve most of the issues. 2.Please run scan and attach the results (you can use ‘sudo iw wlan0 scan > scan_list.txt’) 3.Did you install the wifi card by yourself? It may be that one of the antennas is not connected properly, can you make sure the antenna connectors are firmly attached? 4.You may also install a ‘wifi analyzer’ on your cellular phone and use it, it will show the scan results in a graphical display, and show you all other routers in the area, and their signal strength. It may be that changing your router’s channel will also solve the issue Thanks a lot. Hezi Hezkiyahu INTEL WIFI FW team.
Hello Hezi, thanks for looking into this. There's a single concrete wall and the router is located in ~4m distance. I played already quite a bit with channels, this was already using the best combination I found (could be much worse on other channels). No matter what, I just replaced the card by a new 3160 card. Both antenna cables were firmly attached to the connectors of the 7260 when I removed it, so I don't think this caused my issues (in fact, I hadn't touched the card before, it came pre-installed with the main board when I bought it). Whatsoever, with the 3160 it's a whole other story; data rates are up to 1500kb/s where I got 150-300kb/s before. Almost no latency, ping rates in the ms range where they should be; youtube back to usable. All other parameters were unchanged, by the way (channel settings, router / antenna location, etc.)...