Bug 97291 - iwlwifi 7260: transfer stalls - MWG100235061
Summary: iwlwifi 7260: transfer stalls - MWG100235061
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-25 21:05 UTC by Stefan Soeffing
Modified: 2016-03-08 20:40 UTC (History)
8 users (show)

See Also:
Kernel Version: 4.0.0
Tree: Mainline
Regression: No


Attachments
dmesg output (73.30 KB, text/plain)
2015-04-25 21:05 UTC, Stefan Soeffing
Details
firmware dump (986.37 KB, application/pgp-encrypted)
2015-04-25 21:08 UTC, Stefan Soeffing
Details
dmesg output (86.12 KB, text/plain)
2015-05-01 17:22 UTC, Stefan Soeffing
Details
dmesg output, no queue hang (66.92 KB, text/plain)
2015-05-02 08:26 UTC, Stefan Soeffing
Details
Core10 firmware -13.ucode (768.48 KB, application/octet-stream)
2015-05-03 13:06 UTC, Emmanuel Grumbach
Details
Firmware dump for 25.18.12.0 (130.30 KB, application/octet-stream)
2015-05-05 14:00 UTC, sheksis
Details
dmesg output for 25.18.12.0 (74.52 KB, application/octet-stream)
2015-05-05 14:01 UTC, sheksis
Details
25.12.17.0 with uSniffer (764.09 KB, application/octet-stream)
2015-05-05 18:29 UTC, Emmanuel Grumbach
Details
uSniffer iwl dump for 25.12.17.0 (4.13 MB, application/octet-stream)
2015-05-05 18:59 UTC, sheksis
Details
Corresponding dmesg output for above. (75.77 KB, application/octet-stream)
2015-05-05 18:59 UTC, sheksis
Details
dmesg with -13 firmware (173.70 KB, text/plain)
2015-05-06 16:52 UTC, Stefan Soeffing
Details
Core10 firmware -13.ucode (768.48 KB, application/octet-stream)
2015-05-06 16:56 UTC, Emmanuel Grumbach
Details
iwl dump and dmesg snip (107.44 KB, application/x-bzip)
2015-05-07 23:41 UTC, Jasper Mackenzie
Details
new bluetooth firmware (24.77 KB, application/octet-stream)
2015-05-08 11:16 UTC, Tedd An
Details
dmesg with -13 firmware, -rc3 (170.51 KB, text/plain)
2015-05-11 20:34 UTC, Stefan Soeffing
Details
firmware dump, new and old AP (1.85 MB, application/pgp-encrypted)
2015-05-12 10:26 UTC, Stefan Soeffing
Details
Core10 firmware -13.ucode with uSniffer (864.48 KB, application/octet-stream)
2015-06-02 06:43 UTC, Emmanuel Grumbach
Details
Core10 firmware -13.ucode with uSniffer (768.60 KB, application/octet-stream)
2015-06-02 17:34 UTC, Emmanuel Grumbach
Details
firmware dump with -13 firmware (834.93 KB, application/pgp-encrypted)
2015-06-02 18:33 UTC, Stefan Soeffing
Details
firmware dump, ch 1 (92.08 KB, application/pgp-encrypted)
2015-07-14 16:02 UTC, Stefan Soeffing
Details
dmesg output (75.56 KB, text/plain)
2015-09-08 02:07 UTC, Stefan Mashkevich
Details
dmesg output (107.12 KB, text/plain)
2015-10-14 19:01 UTC, Jerome Leclanche
Details
dmesg with drive plugged in/out (155.71 KB, text/plain)
2015-10-15 22:41 UTC, Jerome Leclanche
Details
Core14 FW with uSniffer (1.00 MB, application/octet-stream)
2015-10-28 20:36 UTC, Emmanuel Grumbach
Details
firmware dump (784.83 KB, application/pgp-encrypted)
2015-10-31 20:48 UTC, Stefan Soeffing
Details
Core14 FW with more debug probes (1.00 MB, application/octet-stream)
2016-02-29 07:38 UTC, Emmanuel Grumbach
Details
firmware dump (1.07 MB, application/pgp-encrypted)
2016-03-03 20:54 UTC, Stefan Soeffing
Details

Description Stefan Soeffing 2015-04-25 21:05:25 UTC
Created attachment 175081 [details]
dmesg output

All,

after experiencing bug 93431 I was happy to have a solution at hand. Yesterday I changed my router (Fritz!Box 7112 to Fritz!Box 7312) and see new problems come up (I'm not sure yet, if it's indeed related to the new router, I'll keep testing..). It might as well be related to bug 95941, though I'm not totally sure.

In the beginning I usually start with a decent connection and low ping. Sometimes ping goes up to a few hundreds of ms but recovers quickly (after seconds).

However, sometimes ping goes up largely and the connection stalls. (Still, I do not get 'No buffer space available' message as in bug 93431). After minutes, it recovers. Running ping during that period gives (note that dropped packages in between):

64 bytes from 192.168.1.1: icmp_seq=3523 ttl=64 time=1.75 ms
64 bytes from 192.168.1.1: icmp_seq=3524 ttl=64 time=11794 ms
64 bytes from 192.168.1.1: icmp_seq=3525 ttl=64 time=11021 ms
64 bytes from 192.168.1.1: icmp_seq=3849 ttl=64 time=1021 ms
64 bytes from 192.168.1.1: icmp_seq=3850 ttl=64 time=28.5 ms

This was with kernel 4.0.0 and happens with both, 25.16.12.0 (from bug 93431) and 25.17.12.0 (from bug 95941).

See attached output from
echo 1 > /sys/kernel/debug/iwlwifi/0000:03:00.0/iwlmvm/fw_restart
cat /sys/devices/virtual/devcoredump/devcd1/data > iwl.dump
Comment 1 Stefan Soeffing 2015-04-25 21:08:05 UTC
Created attachment 175091 [details]
firmware dump
Comment 2 Stefan Soeffing 2015-04-27 20:45:05 UTC
It's frustrating, today I wasn't even able to connect to the wireless network, so I finally switched back to the old router (which works pretty well with the firmware from bug 93431).
Comment 3 Emmanuel Grumbach 2015-04-28 11:10:10 UTC
I you don't CC ilw@linux.intel.com, I won't be notified about the bug...
Anyway, just did that.


Please let me know what is the configuration of the AP:
I am especially interested to know the bandwidth.

What is your distribution?
Comment 4 Stefan Soeffing 2015-04-28 18:44:29 UTC
Sorry, I was not aware of that.

Fortunately I can connect to the network again (with the new AP). Strange enough I am not aware of having changed any relevant setting (was it related to the bad weather we had yesterday?).

The new AP uses the 802.11n standard (while the previous one did not support that). According to the manual it will automatically determine the best bandwidth based on the number of networks in the vicinity. So I'm not sure what it was actually using when the problems occurred last time; right now it shows 20MHz bandwidth.
Channel selection is set to automatic mode.

I'll play with the settings and report back. Is there anything specific you'd recommend to look for?

I'm using ubuntustudio 14.10.
Comment 5 Emmanuel Grumbach 2015-04-28 18:51:57 UTC
Ok - does it support 11ac by chance?
I am afraid that your AP is changing to a channel that is not allowed by the regulatory database of Ubuntu which is 2 years old...
They are working on updating it.

You can try to update your regulatory domain to US:
iw reg set US

the frequency of your AP might be allowed in the database for US.
Another possibility is to update the regulatory database.

See the link to the bug I added here.

All this is just a guess though.
Comment 6 Stefan Soeffing 2015-04-28 19:13:11 UTC
Unfortunately not, even for 11n it's restricted to 2.4GHz (I hadn't chosen that AP myself, but I got it almost for free from my ISP with a new contract).

I set it to fixed channel now, this one seems to work (i.e. I can establish the connection in the first place). No connection issue so far, but need to test a bit more.
Comment 7 Emmanuel Grumbach 2015-05-01 05:10:18 UTC
News?
Comment 8 Stefan Soeffing 2015-05-01 17:22:28 UTC
Created attachment 175451 [details]
dmesg output

So far I had no more issues connecting to the network, thanks for your hint.

Still, I get very high ping rates (20-30s) at some times. From the dmesg output it seems that I do see the queue hang as well, as in bug 95941.

Nevertheless, I think that I had similar effects without the queue stuck report in dmesg. I'll need some more time to see if those things are related or not. Anyway, is there anything I could provide in the meantime to help debugging the queue hang issue?
Comment 9 Stefan Soeffing 2015-05-02 08:26:57 UTC
Created attachment 175551 [details]
dmesg output, no queue hang

Finally I also had the issue without any indication of the queue hang in dmesg (see attached). It was not as severe as I had seen before (ping reply 2-3s for a time of ~15s, then recovered to normal).

I'm on kernel 4.0.0 with 25.17.12.0. dump doesn't work (probably not in this version?):
cat: /sys/devices/virtual/devcoredump/devcd1/data: File not found.

Anything data I can provide?
Comment 10 Emmanuel Grumbach 2015-05-02 17:43:13 UTC
the dump is created only when you have a firmware error: a firmware crash or a queue stuck.
2 ~ 3 can happen if you get disconnected, not if the connection is kept.
I'll leave this bug open for now. I will publish a new version of the firmware (-13.ucode) in the coming days. This version will be available starting 4.1. I can provide a backport based version of the driver if you are interested in testing this new version.
Comment 11 Stefan Soeffing 2015-05-03 07:35:02 UTC
I get the impression that the issue is triggered only if there is some traffic happening. I have ms ping rates for a long time when I leave it running alone. Starting a file download or YouTube video usually triggers ping rates to go up (now I've seen up to 10s). Downloads will then creep at a few kb/s for some time, until it recovers at some point and I get > 500kb/s rates.

I'd be happy to look into that further when -13 is out. No problem to install 4.1 to do so when it is out.
Comment 12 Emmanuel Grumbach 2015-05-03 07:38:12 UTC
Ok.

I can send a backport based driver so that you can run our latest driver / firmware without changing your base kernel.
Let me know.
Comment 13 Stefan Soeffing 2015-05-03 13:04:54 UTC
Please leave me a note when -13 is published. I think there's no need for a backport, I'll try with 4.1rc1.
Comment 14 Emmanuel Grumbach 2015-05-03 13:06:55 UTC
Created attachment 175711 [details]
Core10 firmware -13.ucode

here you go.
Comment 15 Stefan Soeffing 2015-05-03 20:23:31 UTC
Thanks, Emmanuel. I won't have time to test it right away, might take up to a week to do so. I'll keep you posted.
Comment 16 sheksis 2015-05-04 17:05:07 UTC
Sadly, I am getting this with the firmware from bug93431.

ping
------
64 bytes from 192.168.1.1: icmp_seq=12514 ttl=254 time=2241 ms
64 bytes from 192.168.1.1: icmp_seq=12515 ttl=254 time=1573 ms
64 bytes from 192.168.1.1: icmp_seq=12516 ttl=254 time=607 ms
64 bytes from 192.168.1.1: icmp_seq=12517 ttl=254 time=648 ms
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available



journalctl logs firmware version
----------------------------------
May 04 18:07:20 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.16.12.0 op_mode iwlmvm


Sorry, I couldn't collect the core this time. I will try to do it the next time it happens.

I can also try this new firmware but I cannot run kernel 4.0. If you can give the kernel for 3.19 series then I can test it out.
Comment 17 Emmanuel Grumbach 2015-05-04 17:28:33 UTC
You should be running 25.17.12.0.
Please check with this firmware.
Comment 18 sheksis 2015-05-04 19:06:55 UTC
@Emmanuel

May 05 00:01:55 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.17.12.0 op_mode iwlmvm


64 bytes from 192.168.1.1: icmp_seq=1846 ttl=254 time=1094 ms
64 bytes from 192.168.1.1: icmp_seq=1847 ttl=254 time=4517 ms
64 bytes from 192.168.1.1: icmp_seq=1848 ttl=254 time=3867 ms
64 bytes from 192.168.1.1: icmp_seq=1849 ttl=254 time=3109 ms
64 bytes from 192.168.1.1: icmp_seq=1851 ttl=254 time=1967 ms
64 bytes from 192.168.1.1: icmp_seq=1852 ttl=254 time=4545 ms
64 bytes from 192.168.1.1: icmp_seq=1853 ttl=254 time=6021 ms
64 bytes from 192.168.1.1: icmp_seq=1854 ttl=254 time=5742 ms
64 bytes from 192.168.1.1: icmp_seq=1857 ttl=254 time=3461 ms
64 bytes from 192.168.1.1: icmp_seq=1858 ttl=254 time=3000 ms
64 bytes from 192.168.1.1: icmp_seq=1859 ttl=254 time=2073 ms
64 bytes from 192.168.1.1: icmp_seq=1860 ttl=254 time=1666 ms
64 bytes from 192.168.1.1: icmp_seq=1861 ttl=254 time=1465 ms
64 bytes from 192.168.1.1: icmp_seq=1862 ttl=254 time=1107 ms


Don't know why but on some days the problem is more pronounced than others.
Comment 19 Emmanuel Grumbach 2015-05-05 13:01:35 UTC
Firmware ported a fix to the -12.ucode:

https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7260-12.ucode

I'd be grateful if you could test it. Thanks.
Comment 20 sheksis 2015-05-05 13:20:10 UTC
Testing this:

May 05 18:41:39 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.18.12.0 op_mode iwlmvm

I'll let you know if something goes wrong.
Comment 21 sheksis 2015-05-05 13:58:28 UTC
I think its worse:

Ping
-----
64 bytes from 192.168.1.1: icmp_seq=106 ttl=254 time=1.47 ms
64 bytes from 192.168.1.1: icmp_seq=107 ttl=254 time=13.3 ms
64 bytes from 192.168.1.1: icmp_seq=108 ttl=254 time=7.13 ms
64 bytes from 192.168.1.1: icmp_seq=109 ttl=254 time=564 ms
64 bytes from 192.168.1.1: icmp_seq=110 ttl=254 time=929 ms
64 bytes from 192.168.1.1: icmp_seq=111 ttl=254 time=447 ms
64 bytes from 192.168.1.1: icmp_seq=112 ttl=254 time=235 ms
64 bytes from 192.168.1.1: icmp_seq=113 ttl=254 time=383 ms
64 bytes from 192.168.1.1: icmp_seq=114 ttl=254 time=1011 ms
64 bytes from 192.168.1.1: icmp_seq=115 ttl=254 time=35.2 ms
64 bytes from 192.168.1.1: icmp_seq=116 ttl=254 time=1224 ms
64 bytes from 192.168.1.1: icmp_seq=117 ttl=254 time=1682 ms
64 bytes from 192.168.1.1: icmp_seq=118 ttl=254 time=2055 ms
64 bytes from 192.168.1.1: icmp_seq=119 ttl=254 time=1810 ms
64 bytes from 192.168.1.1: icmp_seq=120 ttl=254 time=1617 ms
64 bytes from 192.168.1.1: icmp_seq=121 ttl=254 time=1211 ms
64 bytes from 192.168.1.1: icmp_seq=122 ttl=254 time=1437 ms


See the instant jump in latency on opening a youtube video.

Firmware Version
------------------
Cross checking again:

May 05 19:19:18 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.18.12.0 op_mode iwlmvm


Attaching dmesg and dump
Comment 22 sheksis 2015-05-05 14:00:34 UTC
Created attachment 175871 [details]
Firmware dump for 25.18.12.0
Comment 23 sheksis 2015-05-05 14:01:01 UTC
Created attachment 175881 [details]
dmesg output for 25.18.12.0
Comment 24 Emmanuel Grumbach 2015-05-05 14:13:26 UTC
this was a production firmware and hence the dump is useless.

Thanks anyway.
Comment 25 sheksis 2015-05-05 14:17:00 UTC
@Emmanuel
I can try out debug firmwares if you can provide me any.

By the way, 25.18.12.0 is so bad that I am now getting:

ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available

Reverting back to 25.17.12.0.
Comment 26 Emmanuel Grumbach 2015-05-05 14:27:28 UTC
ok - thanks...

let me know if reverting back to 25.17.12.0 helps...
Comment 27 sheksis 2015-05-05 16:35:42 UTC
No respite. Although I haven't hit the buffer space error in 25.17.12.0 but that
might be accidental.

PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=254 time=14303 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=254 time=16518 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=254 time=16148 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=254 time=17028 ms
64 bytes from 192.168.1.1: icmp_seq=5 ttl=254 time=17185 ms
64 bytes from 192.168.1.1: icmp_seq=6 ttl=254 time=18344 ms


I'll wait till you guys have something to say more about it. This happens to be
the only card that came with my Thinkpad and I'm facing a lot of trouble in my day to day work(VPN etc.)
 I remember specifically customizing the TP to get the Intel card instead of the inbuilt TP one :)
Comment 28 Emmanuel Grumbach 2015-05-05 18:29:31 UTC
Created attachment 175921 [details]
25.12.17.0 with uSniffer

Please create a dump with the firmware attached.
I will forward the information to the firmware team.

Thank you.
Comment 29 sheksis 2015-05-05 18:56:31 UTC
Thanks. Here you go

journalctl
------------
May 06 00:19:57 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 25.17.12.0 op_mode iwlmvm


Ping instance where it suddenly deteriorated
----------------------------------------------
64 bytes from 192.168.1.1: icmp_seq=113 ttl=254 time=11.4 ms
64 bytes from 192.168.1.1: icmp_seq=114 ttl=254 time=2.13 ms
64 bytes from 192.168.1.1: icmp_seq=115 ttl=254 time=4.68 ms
64 bytes from 192.168.1.1: icmp_seq=116 ttl=254 time=4.39 ms
64 bytes from 192.168.1.1: icmp_seq=117 ttl=254 time=3.85 ms
64 bytes from 192.168.1.1: icmp_seq=118 ttl=254 time=6.68 ms
64 bytes from 192.168.1.1: icmp_seq=119 ttl=254 time=923 ms
64 bytes from 192.168.1.1: icmp_seq=121 ttl=254 time=5632 ms
64 bytes from 192.168.1.1: icmp_seq=122 ttl=254 time=7788 ms
64 bytes from 192.168.1.1: icmp_seq=123 ttl=254 time=6788 ms
64 bytes from 192.168.1.1: icmp_seq=124 ttl=254 time=5951 ms
64 bytes from 192.168.1.1: icmp_seq=125 ttl=254 time=4975 ms
64 bytes from 192.168.1.1: icmp_seq=126 ttl=254 time=4383 ms
64 bytes from 192.168.1.1: icmp_seq=129 ttl=254 time=5336 ms
Comment 30 sheksis 2015-05-05 18:59:01 UTC
Created attachment 175931 [details]
uSniffer iwl dump for 25.12.17.0
Comment 31 sheksis 2015-05-05 18:59:55 UTC
Created attachment 175941 [details]
Corresponding dmesg output for above.
Comment 32 Emmanuel Grumbach 2015-05-05 19:05:19 UTC
data has been transferred to the firmware team.
Comment 33 Stefan Soeffing 2015-05-06 16:52:58 UTC
Created attachment 176031 [details]
dmesg with -13 firmware

Emmanuel,

this is what I get with -13 firmware and 4.1rc2. Won't connect to the network.
Comment 34 Emmanuel Grumbach 2015-05-06 16:55:57 UTC
Wow....

it works for me...
I am not involve in the firmware, I am just delivering it...
Comment 35 Emmanuel Grumbach 2015-05-06 16:56:58 UTC
Created attachment 176041 [details]
Core10 firmware -13.ucode

here is a new version of the -13.ucode.... Let me know.
Comment 36 Emmanuel Grumbach 2015-05-06 18:24:20 UTC
ah... I see
You are missing a patch that is on its way upstream...

You can pull iwlwifi-fixes.git or wait for -rc3.
I can also send a backport version.
Comment 37 sheksis 2015-05-07 16:57:18 UTC
I made a discovery. I listen to music through a bluetooth speaker. So the speaker is invariably on when I run a youtube video. Today it was off... and the moment I turned it on, see the jump in latency...

64 bytes from 192.168.1.1: icmp_seq=2729 ttl=254 time=1.75 ms
64 bytes from 192.168.1.1: icmp_seq=2730 ttl=254 time=3.98 ms
64 bytes from 192.168.1.1: icmp_seq=2731 ttl=254 time=5.22 ms
64 bytes from 192.168.1.1: icmp_seq=2732 ttl=254 time=1.74 ms
64 bytes from 192.168.1.1: icmp_seq=2733 ttl=254 time=443 ms
64 bytes from 192.168.1.1: icmp_seq=2734 ttl=254 time=157 ms
64 bytes from 192.168.1.1: icmp_seq=2735 ttl=254 time=2163 ms
64 bytes from 192.168.1.1: icmp_seq=2736 ttl=254 time=2463 ms
64 bytes from 192.168.1.1: icmp_seq=2737 ttl=254 time=2198 ms
64 bytes from 192.168.1.1: icmp_seq=2738 ttl=254 time=1228 ms
64 bytes from 192.168.1.1: icmp_seq=2739 ttl=254 time=1008 ms
64 bytes from 192.168.1.1: icmp_seq=2740 ttl=254 time=187 ms


Now with this latency, the video was about to get stuck when I immediately switched it off. The latency took some time to recover but when it did, it did not climb back again

64 bytes from 192.168.1.1: icmp_seq=2802 ttl=254 time=3154 ms
64 bytes from 192.168.1.1: icmp_seq=2803 ttl=254 time=2950 ms
64 bytes from 192.168.1.1: icmp_seq=2804 ttl=254 time=2908 ms
64 bytes from 192.168.1.1: icmp_seq=2805 ttl=254 time=2287 ms
64 bytes from 192.168.1.1: icmp_seq=2806 ttl=254 time=1335 ms
64 bytes from 192.168.1.1: icmp_seq=2807 ttl=254 time=475 ms
64 bytes from 192.168.1.1: icmp_seq=2808 ttl=254 time=11.4 ms
64 bytes from 192.168.1.1: icmp_seq=2809 ttl=254 time=2.19 ms
64 bytes from 192.168.1.1: icmp_seq=2810 ttl=254 time=3.35 ms
64 bytes from 192.168.1.1: icmp_seq=2811 ttl=254 time=1.71 ms
64 bytes from 192.168.1.1: icmp_seq=2812 ttl=254 time=3.62 ms
64 bytes from 192.168.1.1: icmp_seq=2813 ttl=254 time=1.21 ms


I again switched on the speaker and the latency jumped yet again

64 bytes from 192.168.1.1: icmp_seq=2980 ttl=254 time=5.37 ms
64 bytes from 192.168.1.1: icmp_seq=2981 ttl=254 time=1.06 ms
64 bytes from 192.168.1.1: icmp_seq=2982 ttl=254 time=5.96 ms
64 bytes from 192.168.1.1: icmp_seq=2983 ttl=254 time=1.45 ms
64 bytes from 192.168.1.1: icmp_seq=2984 ttl=254 time=39.4 ms
64 bytes from 192.168.1.1: icmp_seq=2985 ttl=254 time=906 ms
64 bytes from 192.168.1.1: icmp_seq=2986 ttl=254 time=869 ms
64 bytes from 192.168.1.1: icmp_seq=2987 ttl=254 time=1271 ms
64 bytes from 192.168.1.1: icmp_seq=2988 ttl=254 time=1554 ms

I repeated the process many times with repeatable results. Maybe the kernel has a role here?

@Stefan : Do you by chance use a bluetooth device too?

@Emmanuel: Let me know what you think about it. I can collect any logs that you might want.
Comment 38 Emmanuel Grumbach 2015-05-07 17:36:24 UTC
Using bluetooth has a major impact on WiFi. Especially if you are connected in 2.4GHz. I always assume that users don't use bluetooth because they typically tell me when they do use bluetooth. Not in this case :)

Having a high latency when bluetooth is normal, but I don't know if such a high latency is acceptable.
I'd have to check.
I'd love to hear from other users if the bug they are seeing is also when they use bluetooth.

Thanks.
Comment 39 sheksis 2015-05-07 18:09:46 UTC
Hmmm... I had the faintest clue that they could be interfering. I shifted to Channel 13 in WiFi, towards the fag end of the ISM band and things improved dramatically.  Now the latencies do increase but only by a few hundreds. Went to the seconds range but only for 1-2 pings.

I've played three 1080p videos back to back now with no videos getting stuck.
Signs of a victory... Hoping it holds for the next couple of days :) Thanks for the pointer!
Comment 40 Emmanuel Grumbach 2015-05-07 18:25:04 UTC
Ok - but there is clearly a bug. Running 1080p with BT A2DP is definitely something that should be working. The problem is that debugging these issues is a pain because I have no clue about bluetooth's version, known issues etc...

The BT firmware is under /lib/firmware/intel/
I can see that a few BT firmwares were updated in February. I'd try to take the latest linux-firwmare.git and copy the content of the intel/ dir to your /lib/firmware. Then you'll need to reboot.
Comment 41 sheksis 2015-05-07 19:14:14 UTC
@Emmanuel
Did as you told. Copied the new firmware in my intel/ directory. Also, reverted back to my old channel. Things are working correctly as of now. I'll watch it for a couple of days in case its a fluke.
Comment 42 Emmanuel Grumbach 2015-05-07 19:17:14 UTC
That's the problem with BT Coex issues, they can be either in BT or in WiFi :)
Comment 43 sheksis 2015-05-07 19:32:11 UTC
*sigh* Hit it again:

64 bytes from 192.168.1.1: icmp_seq=83 ttl=254 time=1311 ms
64 bytes from 192.168.1.1: icmp_seq=84 ttl=254 time=1914 ms
64 bytes from 192.168.1.1: icmp_seq=85 ttl=254 time=3088 ms
64 bytes from 192.168.1.1: icmp_seq=88 ttl=254 time=2440 ms
64 bytes from 192.168.1.1: icmp_seq=89 ttl=254 time=2357 ms
64 bytes from 192.168.1.1: icmp_seq=90 ttl=254 time=2881 ms
64 bytes from 192.168.1.1: icmp_seq=91 ttl=254 time=2797 ms
64 bytes from 192.168.1.1: icmp_seq=92 ttl=254 time=2858 ms
64 bytes from 192.168.1.1: icmp_seq=93 ttl=254 time=2939 ms
64 bytes from 192.168.1.1: icmp_seq=94 ttl=254 time=3488 ms
64 bytes from 192.168.1.1: icmp_seq=95 ttl=254 time=3196 ms
64 bytes from 192.168.1.1: icmp_seq=96 ttl=254 time=2938 ms
64 bytes from 192.168.1.1: icmp_seq=97 ttl=254 time=2321 ms
64 bytes from 192.168.1.1: icmp_seq=98 ttl=254 time=1475 ms

The latencies have definitely improved with the new intel/* firmwares, as in I dont see them going beyond 2-3s ... and the 1080p when stuck recovers faster.

Back to my channel changing again. Can we pull in some bluetooth firmware guys in this bug?
Comment 44 Emmanuel Grumbach 2015-05-07 20:38:37 UTC
I don't know about pulling a BT guy, but we can collect logs from the firmware that can allow to debug this.
For that, I'll need to take a customized firmware from the firmware team, but that specific area is in transition there... *sigh*....
Comment 45 Stefan Soeffing 2015-05-07 20:46:57 UTC
Looks like there are different issue going on. I'm not using bluetooth much. In fact, right now it's disabled - at least in blueman-applet, I don't know if that actually disables some hardware part.

@Comment 36: I using ubuntu packages to install the kernel updates, I haven't set up the environment to compile custom kernel. Say, I'd rather like to wait for rc3 to come out, then test firmware -13.
Comment 46 sheksis 2015-05-07 21:02:04 UTC
@Stefan: Thanks for your input. It doesn't matter if your bluetooth is enabled in hardware or not. I noticed that my trouble begins when there is bluetooth *traffic* between my laptop and speaker. Since you do not have any device attached, so its not exactly my setup as far as BT is concerned.

@Emmanuel: Will wait for your guidance on how to proceed about this bluetooth issue. Maybe, I'm hitting Stefan's issue as well this new issue.
Comment 47 Jasper Mackenzie 2015-05-07 23:41:12 UTC
AAAAARG! and help!!
 I used the firmware you suggest with kernel from iwlwifif-fixes.
I got great speeds etc for aaaages, then after suspend it was back to flaky slow transfers, high ping times etc.

BUT worst of all my machine now takes an age to boot with the following error (on _all_ kernels and usb-live iso)
usb 1-6 device read descriptor read/64, error -110
/devices/pci0000:00/0000:00:14.0/usb1 is taking to long
 AND I have no bluetooth on any kernel, which I am using and is essential for getting the thesis I am working on done (hair falling out stress etc).

I have tried all -12 firmwares from 143 up, no change. Bluetooth seems dead!!!

Attached is a dump with dmesg as well for when the connection went from flaky to fail.

Is there a work around to get the bt back up???
Comment 48 Jasper Mackenzie 2015-05-07 23:41:48 UTC
Created attachment 176161 [details]
iwl dump and dmesg snip
Comment 49 Jasper Mackenzie 2015-05-08 03:15:16 UTC
Crisis averted somehow. About half an hour after booting with an older iwlwifi-7260-12 firmware bluetooth mysteriously starts again and boot is fast without errors. Weird as, but good 8)
Comment 50 Emmanuel Grumbach 2015-05-08 05:20:27 UTC
@sheksis

Please check the module parameter bt_coex_active.

Please open a new bug mentioning the bt use case

Thanks
Comment 51 Tedd An 2015-05-08 11:16:58 UTC
Created attachment 176181 [details]
new bluetooth firmware

New bluetooth firmware has been submitted to linux-firmware.git but not accepted yet. Attached the same file.
Try this to see if it can improve the quality.

Copy this file to /lib/firmware/intel and restart the system (cold reboot)
Comment 52 sheksis 2015-05-08 12:44:24 UTC
64 bytes from 192.168.1.1: icmp_seq=56 ttl=254 time=1292 ms
64 bytes from 192.168.1.1: icmp_seq=57 ttl=254 time=760 ms
64 bytes from 192.168.1.1: icmp_seq=58 ttl=254 time=2056 ms
64 bytes from 192.168.1.1: icmp_seq=59 ttl=254 time=1351 ms
64 bytes from 192.168.1.1: icmp_seq=60 ttl=254 time=1405 ms
64 bytes from 192.168.1.1: icmp_seq=61 ttl=254 time=1266 ms
64 bytes from 192.168.1.1: icmp_seq=62 ttl=254 time=1863 ms
64 bytes from 192.168.1.1: icmp_seq=63 ttl=254 time=1271 ms
64 bytes from 192.168.1.1: icmp_seq=64 ttl=254 time=858 ms
64 bytes from 192.168.1.1: icmp_seq=65 ttl=254 time=504 ms
64 bytes from 192.168.1.1: icmp_seq=66 ttl=254 time=985 ms
64 bytes from 192.168.1.1: icmp_seq=67 ttl=254 time=1259 ms
64 bytes from 192.168.1.1: icmp_seq=68 ttl=254 time=590 ms
64 bytes from 192.168.1.1: icmp_seq=69 ttl=254 time=351 ms
64 bytes from 192.168.1.1: icmp_seq=70 ttl=254 time=810 ms
64 bytes from 192.168.1.1: icmp_seq=71 ttl=254 time=276 ms
64 bytes from 192.168.1.1: icmp_seq=72 ttl=254 time=1130 ms
64 bytes from 192.168.1.1: icmp_seq=73 ttl=254 time=763 ms
64 bytes from 192.168.1.1: icmp_seq=74 ttl=254 time=411 ms
64 bytes from 192.168.1.1: icmp_seq=75 ttl=254 time=1088 ms
64 bytes from 192.168.1.1: icmp_seq=76 ttl=254 time=1468 ms
64 bytes from 192.168.1.1: icmp_seq=77 ttl=254 time=1806 ms
64 bytes from 192.168.1.1: icmp_seq=78 ttl=254 time=1632 ms
64 bytes from 192.168.1.1: icmp_seq=79 ttl=254 time=1153 ms


The 1080p video ultimately got stuck with these latencies.


@Ted Let me know what logs I can provide you.

Filed bug97921 for this issue as Emmanuel had asked. Lets shift the bluetooth investigations there.
Comment 53 Emmanuel Grumbach 2015-05-10 06:18:37 UTC
@Stefan:

I had a feedback from the firmware team. The link looks *very* bad. Both sides are using the lowest rate and even with this lowest rate, there are *tons* of failures. This is really unusual. Since you are seeing that with the new AP only, can you please check that the firmware of the AP is up to date.

The beacon timing of the AP is looking very bad, the AP is sending beacons at wrong timings which means that we will miss the beacons when are power saving. The timing function the of AP also looks wrong. The timer of the AP seems to be lying.

You can try to disable power save[1], but I don't see this as the source of all your problems. The AP is really behaving fishy.

[1] sudo iw wlan0 set power_save off
Comment 54 Emmanuel Grumbach 2015-05-10 06:21:18 UTC
another thing. Please try to create a dump immediately after a ping latency of 10ms and above. This might shed more light.

If you can, we'd appreciate if you could create a dump with the first AP (the good one) for comparison.

Also - are the APs at the same place?

thanks.
Comment 55 Emmanuel Grumbach 2015-05-10 06:22:30 UTC
yet another thing :)

please send the output of iw wlan0 link.

Thanks.
Comment 56 Stefan Soeffing 2015-05-11 20:34:38 UTC
Created attachment 176441 [details]
dmesg with -13 firmware, -rc3

Emmanuel,

I tried -13 firmware with -rc3 (Comment 36), but I still get quite some errors in dmesg and no connection. Did the patch make it to -rc3?

---

Anyway thanks for the feedback. I didn't expect the AP to be the root of the issues, I only had good experience with the manufacturer so far. It is using the latest AP Firmware, actually.

For taking a new dump, should I wait until I have -13 up and running or get the dump with -12? Does it matter whether I create the dump with power save on or off?

And finally as requested:
iw wlan0 link
	SSID: Netzbox Lu
	freq: 2437
	RX: 2054577 bytes (5636 packets)
	TX: 615134 bytes (4363 packets)
	signal: -73 dBm
	tx bitrate: 48.0 MBit/s

	bss flags:	short-preamble short-slot-time
	dtim period:	1
	beacon int:	100

Note that the bitrate it fluctuating heavily, during times of high ping latency I see
	signal: -68 dBm
	tx bitrate: 11.0 MBit/s
and
	signal: -78 dBm
	tx bitrate: 1.0 MBit/s
Comment 57 Emmanuel Grumbach 2015-05-11 20:40:29 UTC
My fixes didn't make it into -rc3. They are in net.git right now...

Please use -12.ucode to create the dump after > 10s latency. Please create it with power save disabled.

Your beacon internal is 100. Good. That's reasonable.
Your bitrate is very low when the pings get very slow. This is exactly what we saw in the dump you initially created.

Do you have another device that works well with this AP?
Do you have another Linux system that could record data on the air?
(another Intel device on a Linux machine can do that).
Comment 58 Stefan Soeffing 2015-05-12 10:26:01 UTC
Created attachment 176491 [details]
firmware dump, new and old AP

Did you mean 10s or 10ms latency?

See attached the dump with 25.17.12.0 for old and new AP. For the new AP latency was ~5s when I did the fw_restart. New old AP is more stable, ping latency was at ~100ms.

Both APs where locate in exactly the same place.

For comparison:
Old AP: iw wlan0 link
	SSID: Netzbox Lu
	freq: 2427
	RX: 37395346 bytes (29417 packets)
	TX: 2088375 bytes (19607 packets)
	signal: -61 dBm
	tx bitrate: 36.0 MBit/s

	bss flags:	short-slot-time
	dtim period:	3
	beacon int:	100
Comment 59 Stefan Soeffing 2015-05-12 10:30:22 UTC
Besides the 7260 machine, I'm having another Windows computer here and three Android mobile devices. Even with the new AP there are no major issues obvious on the other devices.

I have an old linux machine around or could use a live USB to boot the Windows machine to Linux and set up wireshark there. Need to check the wireless cards, though.
Comment 60 sheksis 2015-05-13 02:43:23 UTC
Okay, today even with bluetooth off, I got the "no buffer" error. Thats the most degenerate case of high latencies I guess.

After some time, I also get
$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
From 192.168.1.4 icmp_seq=1 Destination Host Unreachable
From 192.168.1.4 icmp_seq=2 Destination Host Unreachable
From 192.168.1.4 icmp_seq=3 Destination Host Unreachable

However, the other devices connected to my AP keep functioning correctly. I was able to ping the AP through a terminal in my Android phone.
Comment 61 sheksis 2015-05-13 02:50:36 UTC
Also, the above was when I wasn't even playing any videos. So the traffic was minuscule.
Comment 62 Emmanuel Grumbach 2015-05-13 04:23:37 UTC
Were you able to create a firmware dump what that happens?

Please record tracing. I'll look at the tracing and send the firmware dump to the firmware people.
I'd like to know also if restarting the firmware (with the debugfs hook) helps.
Thanks.
Comment 63 sheksis 2015-05-13 13:04:47 UTC
No, I wasn't able to create the dump. This time I've set up the uSniffer firmware. Once the issue gets hit, I'll update this bug.
Comment 64 sheksis 2015-05-14 02:02:32 UTC
@Emmanuel

So I have a question, when recording the trace, the wiki asks has several switches listed too. Do you want any specific switches... or should I just do

$sudo trace-cmd record -e iwlwifi
Comment 65 Emmanuel Grumbach 2015-05-14 03:19:42 UTC
That should be enough.
Comment 66 Stefan Soeffing 2015-05-17 08:32:51 UTC
Emmanuel, to create attachment 176491 [details]  I didn't have tracing running - do I need to recreate those while tracing is active?
Comment 67 Emmanuel Grumbach 2015-05-17 08:34:34 UTC
No - you don't.
Comment 68 Emmanuel Grumbach 2015-05-18 17:26:21 UTC
4.1-rc4 can use -13.ucode
Comment 69 Stefan Soeffing 2015-05-18 20:49:09 UTC
Obviously that's another story, but rc4 doesn't boot here at all...
Comment 70 Emmanuel Grumbach 2015-05-21 05:49:17 UTC
please ignore the -13.ucode in this bug and take the -13.ucode from:

https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/tree/

It contains more bug fixes.
Comment 71 Stefan Soeffing 2015-05-23 09:08:16 UTC
Emmanuel,

another interesting observation: To do some testing I've been switching a lot between old and new AP. Both are configured to provide the same SSID, so I never run them at the same time. It happens frequently, that after switching the AP, the card doesn't find the network any more. It will list other networks in the vicinity, but no my own one. Reloading the iwlwifi module fixes this. No problem with other devices.

(This is still with 25.17.12.0 on 4.0. I'll test more as soon as I get 4.1 running)
Comment 72 Emmanuel Grumbach 2015-05-24 18:40:34 UTC
That's a separate issue. Maybe due to scan offload. Not sure.
Comment 73 Stefan Soeffing 2015-05-25 08:09:56 UTC
Unfortunately, it's still there:

64 bytes from 192.168.1.1: icmp_seq=54 ttl=64 time=3.84 ms
64 bytes from 192.168.1.1: icmp_seq=55 ttl=64 time=810 ms
64 bytes from 192.168.1.1: icmp_seq=56 ttl=64 time=683 ms
64 bytes from 192.168.1.1: icmp_seq=57 ttl=64 time=384 ms
64 bytes from 192.168.1.1: icmp_seq=58 ttl=64 time=592 ms
64 bytes from 192.168.1.1: icmp_seq=59 ttl=64 time=3209 ms

with 25.27.13.0 on 4.1-rc5.
Comment 74 Emmanuel Grumbach 2015-06-02 06:39:36 UTC
-13.ucode is now officially published.

You can get it here:

https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/
It includes a few more fixes than the versions I attached to this bug.
Note that the version number might be the same. I encourage everybody to move to the version from the git repository.
Comment 75 Emmanuel Grumbach 2015-06-02 06:43:21 UTC
Created attachment 178531 [details]
Core10 firmware -13.ucode with uSniffer

Can you please create a firmware dump with the firmware attached?

This is a debug version of the same firmware in the git repository.
You'll need to create the dump immediately after the ping time is high.
When someone creates a dump, please let me know what was the channel used and the bandwidth.

Thank you.
Comment 76 Stefan Soeffing 2015-06-02 17:04:49 UTC
This is 7265, I'd need 7260. I assume they are not compatible?
Comment 77 Emmanuel Grumbach 2015-06-02 17:34:08 UTC
Created attachment 178581 [details]
Core10 firmware -13.ucode with uSniffer

Here you go.
Comment 78 Stefan Soeffing 2015-06-02 18:33:13 UTC
Created attachment 178611 [details]
firmware dump with -13 firmware

Thanks.

Attached find the firmware dump at ~5000ms ping time, kernel 4.1-rc6, wifi channel 6, 2437 MHz.

Is there a way to get the bandwidth info from the driver? My AP settings are somewhat cryptic, I'd assume that it is using 20MHz only, but I'm not sure. It is set to b+g mode (802.11n disabled).
Comment 79 Emmanuel Grumbach 2015-06-02 18:40:32 UTC
11n disabled means it cant be 4mhz
Comment 80 Jasper Mackenzie 2015-06-03 04:54:15 UTC
Today I tried with your latest attached -13 firmware and today's iwlwifif-fixes kernel.
 Moving bulk data with rsync or whatever whilst pinging google shows that all is good. Transfer speeds were not amazing, 3-5Mb/s.
 But, there is always a but, bluetooth does not work. The device is detected but does not allow any external peripherals to connect. What info can I give you please?


Connected to 00:22:3f:e5:ab:90 (on wlan0)
	SSID: TempleThumper
	freq: 2462
	RX: 3315480255 bytes (2145912 packets)
	TX: 28941464 bytes (285533 packets)
	signal: -48 dBm
	tx bitrate: 117.0 MBit/s MCS 14

	bss flags:	short-preamble short-slot-time
	dtim period:	3
	beacon int:	100

=== dmesg ===
[  257.850454] Bluetooth: HIDP (Human Interface Emulation) ver 1.2
[  257.850460] Bluetooth: HIDP socket layer initialized
[  283.640849] Bluetooth: Unexpected continuation frame (len 1)
[  411.330340] Bluetooth: hci0 command 0x0804 tx timeout
[  495.561361] Bluetooth: hci0 command 0x041f tx timeout
[  497.567444] Bluetooth: hci0 command 0x0406 tx timeout
Comment 81 Emmanuel Grumbach 2015-06-03 05:37:03 UTC
BT is not tracked in this bug.

Thank you for report.
Comment 82 Stefan Soeffing 2015-07-05 13:10:42 UTC
Unfortunately no improvement with kernel 4.1.1 and 25.30.13.0.

However, I discovered today that the situation is much better on channel 1. I've been trying with channel 3, 6, 11, and 13 - and had the issue on all those. Now I switched to channel 1 and it seems to be much better.

I had not expected this, since scanning for other networks shows that channel 1 is pretty crowded (8 other networks on that channel, plus another one on channel 3).

Any clues already how to fix this?


Btw, I also submitted a new bug 100961 regarding the issue described in comment 71.
Comment 83 Emmanuel Grumbach 2015-07-05 13:40:27 UTC
(In reply to Stefan Soeffing from comment #82)
> Unfortunately no improvement with kernel 4.1.1 and 25.30.13.0.
> 
> However, I discovered today that the situation is much better on channel 1.
> I've been trying with channel 3, 6, 11, and 13 - and had the issue on all
> those. Now I switched to channel 1 and it seems to be much better.
> 
> I had not expected this, since scanning for other networks shows that
> channel 1 is pretty crowded (8 other networks on that channel, plus another
> one on channel 3).
> 
> Any clues already how to fix this?
> 

Please create a dump on Channel 1 and another one on "bad" channel.
I will forward the data to the firmware team.

> 
> Btw, I also submitted a new bug 100961 regarding the issue described in
> comment 71.
I just added Intel to that bug. Please add ilw@linux.intel.com on any Intel WiFi Linux related issues.
Comment 84 Stefan Soeffing 2015-07-05 13:44:58 UTC
I will create one with channel 1. Is the dump from comment 78 enough for comparison?
Comment 85 Emmanuel Grumbach 2015-07-05 16:45:08 UTC
(In reply to Stefan Soeffing from comment #84)
> I will create one with channel 1. Is the dump from comment 78 enough for
> comparison?

Probably yes.
Comment 86 Stefan Soeffing 2015-07-14 16:02:20 UTC
Created attachment 182641 [details]
firmware dump, ch 1

This is a firmware dump on channel 1, firmware 25.30.13.0.

To be honest, I'm not completely sure if that channel is really "better". It's more of a feeling than hard evidence. I can watch youtube videos with (probably) less interruptions. But still ping times are in the few seconds range while watching.
Comment 87 Stefan Soeffing 2015-08-01 20:33:36 UTC
In fact, today I got evidence that my last observations must have been not more than luck. I haven't changed anything in the router configuration, but today I have severe problems with the connection again. Ping times at ~10s, taking almost a minute to recover.

Emmanuel, is someone looking into that? Any more information I could provide?
Comment 88 Stefan Mashkevich 2015-09-08 02:07:39 UTC
Created attachment 187011 [details]
dmesg output

I seem to be having the same problem.

Fedora 22
iwl7260-firmware-25.17.12.0-53.fc22.noarch

# cat /etc/modprobe.d/iwlwifi.conf 
options iwlwifi 11n_disable=1

dmesg output attached. Any other information I could provide, or any experiments I could try? Thanks!
Comment 89 Luca Coelho 2015-09-08 04:30:28 UTC
I'm sorry to hear about all these troubles.  Emmanuel is on vacation now, so I'm backing him up for now.  I'll ping our firmware team to check whether they have any news on this issue.
Comment 90 Luca Coelho 2015-09-16 05:52:49 UTC
This is still on our firmware team's hands, but unfortunately we don't have much progress to report yet.  I'll keep you updated.
Comment 91 Emmanuel Grumbach 2015-10-10 17:49:24 UTC
New versions of the firmware are now available.

There is at least one issue that have been identified in a few of the logs that I know that is *not* fixed. So that I can't promise it will help unfortunately. But it is worth trying...

You can find them here:
https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/

I suggest to test -13.ucode and above.
-13 is pretty much end of life. I suggest to give it a try since it did get a few updates since the version I attached to this bug. -15 and -16 are far more interesting to test.

-17.ucode is still under stabilization.

Thanks.
Comment 92 Jerome Leclanche 2015-10-14 19:01:22 UTC
Created attachment 190231 [details]
dmesg output

I've been experiencing this issue on my asus UX303L on Arch Linux, kernel 4.2.3 - every time I download a few hundred kilobytes, everything hangs for 10 seconds. At first, then I booted a system from a different computer on there off an external drive (same kernel) and experienced this issue; then I rebooted on the main system and the issue was showing itself on *both* systems.

I tried -13, -14, -15 and none of them showed any difference. I also tried -16 and -17 but neither of them could load, couldn't figure out why.

Attached is a dmesg from a recent boot showcasing the issue multiple times. Let me know if there's more I can provide.

lspci: 02:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
Comment 93 Emmanuel Grumbach 2015-10-14 19:09:54 UTC
-16 and -17 can't be loaded from 4.2.3.

You'll have to use the our backport tree for that.
You can find it here:
https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/backport-iwlwifi.git/
Comment 94 Jerome Leclanche 2015-10-15 22:41:15 UTC
Created attachment 190291 [details]
dmesg with drive plugged in/out

I just realized something.

On my main system, this issue happens *only* when I have my SSD plugged in to my USB3 port. Maybe this is a power issue of sorts?

New dmesg attached, including the drive being plugged in and out much later and the hang happening a few times. Does this help?
Comment 95 Emmanuel Grumbach 2015-10-16 01:16:18 UTC
This is still 15.ucode. You may have put 16.ucode on your file system, but the driver is still using 15.ucode.

You need to update the driver.
Comment 96 Jerome Leclanche 2015-10-16 01:19:34 UTC
Yes, I'm aware - I'm stuck on kernel 4.2.3 right now, and compiling the backports isn't something I have time to look into just yet. But I can reliably stop and reproduce the issue by plugging my ssd in or out of the USB socket.
Comment 97 Emmanuel Grumbach 2015-10-16 01:28:08 UTC
Very strange....

I don't really buy the power theory. Because the issue seems to be a radio problem. I would lean towards an interference between the hard disk operation and the WiFi radio. But that is also a wild crazy guess.
Comment 98 Emmanuel Grumbach 2015-10-16 01:29:29 UTC
When you'll update your driver, I'll help you collecting the firmware logs.
Comment 99 sheksis 2015-10-17 03:46:26 UTC
I initiated a 2.6GB download for something. I had several instances where the
speed fell off the cliff. Because:

64 bytes from 192.168.1.1: icmp_seq=449 ttl=255 time=2426 ms                                                                              
64 bytes from 192.168.1.1: icmp_seq=450 ttl=255 time=2879 ms
64 bytes from 192.168.1.1: icmp_seq=451 ttl=255 time=2835 ms
64 bytes from 192.168.1.1: icmp_seq=452 ttl=255 time=3990 ms
64 bytes from 192.168.1.1: icmp_seq=453 ttl=255 time=4479 ms
64 bytes from 192.168.1.1: icmp_seq=454 ttl=255 time=5968 ms
64 bytes from 192.168.1.1: icmp_seq=455 ttl=255 time=5159 ms
64 bytes from 192.168.1.1: icmp_seq=456 ttl=255 time=4496 ms
64 bytes from 192.168.1.1: icmp_seq=457 ttl=255 time=7871 ms
64 bytes from 192.168.1.1: icmp_seq=458 ttl=255 time=7060 ms
64 bytes from 192.168.1.1: icmp_seq=459 ttl=255 time=6145 ms
64 bytes from 192.168.1.1: icmp_seq=460 ttl=255 time=8374 ms
64 bytes from 192.168.1.1: icmp_seq=461 ttl=255 time=7375 ms
64 bytes from 192.168.1.1: icmp_seq=462 ttl=255 time=6376 ms
64 bytes from 192.168.1.1: icmp_seq=463 ttl=255 time=5380 ms
64 bytes from 192.168.1.1: icmp_seq=464 ttl=255 time=4382 ms
64 bytes from 192.168.1.1: icmp_seq=465 ttl=255 time=3387 ms
64 bytes from 192.168.1.1: icmp_seq=466 ttl=255 time=2395 ms
64 bytes from 192.168.1.1: icmp_seq=467 ttl=255 time=1405 ms
64 bytes from 192.168.1.1: icmp_seq=468 ttl=255 time=419 ms
64 bytes from 192.168.1.1: icmp_seq=469 ttl=255 time=236 ms


Since bluetooth isnt turned ON, so it is definitely this bug. Heres the dmesg to begin with:

$ journalctl -b0 --no-pager | grep iwlwifi
Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 17.228510.0 op_mode iwlmvm
Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
Oct 17 08:26:53 shatrupa kernel: iwlwifi 0000:04:00.0 wlp4s0: renamed from wlan0
Oct 17 08:26:55 shatrupa NetworkManager[931]: <info>  rfkill2: found WiFi radio killswitch (at /sys/devices/pci0000:00/0000:00:1c.1/0000:04:00.0/ieee80211/phy0/rfkill2) (driver iwlwifi)
Oct 17 08:26:55 shatrupa NetworkManager[931]: <info>  (wlp4s0): new 802.11 WiFi device (carrier: UNKNOWN, driver: 'iwlwifi', ifindex: 3)
Oct 17 08:26:55 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
Oct 17 08:26:55 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
Oct 17 08:26:55 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
Oct 17 08:26:55 shatrupa kernel: iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
Oct 17 08:26:58 shatrupa kernel: iwlwifi 0000:04:00.0 wlp4s0: disabling HT/VHT due to WEP/TKIP use
Oct 17 08:26:58 shatrupa kernel: iwlwifi 0000:04:00.0 wlp4s0: disabling HT as WMM/QoS is not supported by the AP
Oct 17 08:26:58 shatrupa kernel: iwlwifi 0000:04:00.0 wlp4s0: disabling VHT as WMM/QoS is not supported by the AP


The kernel and firmware are the latest ones based on the guidance in bug97921.
Comment 100 Stefan Soeffing 2015-10-23 17:46:19 UTC
Emmanuel,

I just tried the new versions. I'm running now with
kernel 4.3.0-040300rc6-generic
iwlwifi 0000:03:00.0: loaded firmware version 17.228510.0 op_mode iwlmvm
(bluetooth turned off)

Still I get at times:
64 bytes from 192.168.1.1: icmp_seq=90 ttl=64 time=3927 ms
64 bytes from 192.168.1.1: icmp_seq=91 ttl=64 time=2950 ms
64 bytes from 192.168.1.1: icmp_seq=92 ttl=64 time=2830 ms
...

Anything else I should look for? Btw, I didn't install the bluetooth firmware update from bug 97921, does this make a difference?

What can I provide for further debugging, do you need another firmware dump?
Comment 101 Stefan Soeffing 2015-10-23 17:47:16 UTC
Btw, I also tried with -16 firmware - same thing there as well.
Comment 102 Emmanuel Grumbach 2015-10-28 20:36:18 UTC
Created attachment 191431 [details]
Core14 FW with uSniffer

Can you please record a firmware dump with this firmware?

Please stop the collection as soon as you have long ping latencies.
Thanks.
Comment 103 Stefan Soeffing 2015-10-31 20:48:58 UTC
Created attachment 191761 [details]
firmware dump

Hi Emmanuel,

please find attached a new dump. While watching a video on youtube, I got

64 bytes from 192.168.1.1: icmp_seq=2040 ttl=64 time=22945 ms
64 bytes from 192.168.1.1: icmp_seq=2041 ttl=64 time=22864 ms
...


Thanks.
Comment 104 Stefan Soeffing 2015-12-29 10:07:24 UTC
Hi Emmanuel,

thanks to your hint I just read the related report, bug 103531. The problem looks similar (though I get slow connection only, no firmware crashes). Do you have a clue whether both issues have the same root cause?

I'd be happy to assist in providing more feedback if that helps.
Comment 105 sheksis 2015-12-30 06:09:40 UTC
I also occasionally face this use, say once in 3 days. and I am running the following

Kernel
-------
$ uname -r
4.4.0-0.rc6.git1.1.fc24.x86_64

Logs
------
Dec 30 09:08:04 shatrupa kernel: iwlwifi 0000:04:00.0: loaded firmware version 16.242414.0 op_mode iwlmvm
Comment 106 Emmanuel Grumbach 2016-02-29 07:38:24 UTC
Created attachment 206391 [details]
Core14 FW with more debug probes

Our firmware team introduced more debug prints to try to understand what is going on.

Can you please run with the firmware attached and produce a dump once again?

Thank you.
Comment 107 Stefan Soeffing 2016-02-29 16:38:20 UTC
Emmanuel,

thanks, I will do that, give me a few days. 

To be honest, I'm  somewhat disappointed with that device - that's why I ordered a 3160 card yesterday as a replacement. Hopefully this gives a more stable connection.
Comment 108 Stefan Soeffing 2016-03-03 20:54:52 UTC
Created attachment 206721 [details]
firmware dump

Emmanuel,

I created a new dump using
echo 1 > /sys/kernel/debug/iwlwifi/0000:03:00.0/iwlmvm/fw_restart
cat /sys/devices/virtual/devcoredump/devcd1/data > iwl.dump

while I had high latency (ping time ~ 10s loading a youtube video)

This is with kernel 4.5.0-040500rc6-generic and
iwlwifi 0000:03:00.0: loaded firmware version 17.295852.0 op_mode iwlmvm

Did I miss something? Please let me know.
Comment 109 Emmanuel Grumbach 2016-03-03 20:58:38 UTC
Excellent - thanks ... again...
Comment 110 Emmanuel Grumbach 2016-03-06 17:45:47 UTC
Stefan,

I forwarded your data to our firmware team for analysis. Hezi Hezkiyahu will share his  findings in bugzilla.
Comment 111 Hezi Hezkiyahu 2016-03-07 07:48:40 UTC
Hello

Looking at the logs, I don’t see any evidence for a bug, but I do see that there is some link issue, meaning the signal your wifi is receiving is weak, I also suspect a problem in one of your antennas and/or interference from other routers in the area.

To pin-point the issue, we need to check for a few things, please try to do the following:

1.How many brick/concrete walls are there between the router and your device? Can you move it? Moving the router a little may resolve most of the issues.
2.Please run scan and attach the results (you can use  ‘sudo iw wlan0 scan > scan_list.txt’)
3.Did you install the wifi card by yourself? It may be that one of the antennas is not connected properly, can you make sure the antenna connectors are firmly attached?
4.You may also install a ‘wifi analyzer’ on your cellular phone and use it, it will show the scan results in a graphical display, and show you all other routers in the area, and their signal strength. It may be that changing your router’s channel will also solve the issue

Thanks a lot.

Hezi Hezkiyahu
INTEL WIFI FW team.
Comment 112 Stefan Soeffing 2016-03-07 09:55:44 UTC
Hello Hezi,

thanks for looking into this.

There's a single concrete wall and the router is located in ~4m distance. I played already quite a bit with channels, this was already using the best combination I found (could be much worse on other channels).

No matter what, I just replaced the card by a new 3160 card. Both antenna cables were firmly attached to the connectors of the 7260 when I removed it, so I don't think this caused my issues (in fact, I hadn't touched the card before, it came pre-installed with the main board when I bought it).

Whatsoever, with the 3160 it's a whole other story; data rates are up to 1500kb/s where I got 150-300kb/s before. Almost no latency, ping rates in the ms range where they should be; youtube back to usable.

All other parameters were unchanged, by the way (channel settings, router / antenna location, etc.)...

Note You need to log in before you can comment on or make changes to this bug.