Bug 103531 - iwlwifi: 7260: device connected to station but no connectivity (MWG100247592)
Summary: iwlwifi: 7260: device connected to station but no connectivity (MWG100247592)
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
: 107471 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-08-26 03:58 UTC by ailin.nemui+kernel
Modified: 2016-02-15 21:02 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.1.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output attached (51.95 KB, text/plain)
2015-08-26 04:26 UTC, ailin.nemui+kernel
Details
wireshark capture of iwlwifi card itself (11.79 KB, application/pgp-encrypted)
2015-09-22 00:19 UTC, ailin.nemui+kernel
Details
this capture done from 2nd device: connectivity is lost -> connectivity restored by hard reset (1.89 MB, application/pgp-encrypted)
2015-10-02 18:39 UTC, ailin.nemui+kernel
Details
capture from 2nd device: connectivity works -> connectivity broken (3.91 MB, application/pgp-encrypted)
2015-10-03 12:08 UTC, ailin.nemui+kernel
Details
Core14 FW with uSniffer (1.00 MB, application/octet-stream)
2015-11-16 08:46 UTC, Luca Coelho
Details
trace.dat encrypted vinilox (2.28 MB, application/pgp-encrypted)
2015-11-18 13:12 UTC, Vinilox
Details
Dmesg with CONFIG_IWLWIFI_DEBUG=y (21.82 KB, application/pgp-encrypted)
2015-11-18 13:23 UTC, Vinilox
Details
FW dump (400.19 KB, application/pgp-encrypted)
2015-11-18 14:18 UTC, Vinilox
Details
fwdump (768.71 KB, application/pgp-encrypted)
2015-11-18 20:13 UTC, Vinilox
Details
dmesg (64.96 KB, application/octet-stream)
2015-11-18 20:54 UTC, Vinilox
Details
Core14 FW with uSniffer with TX_MNG probes (1.00 MB, application/octet-stream)
2015-11-19 07:34 UTC, Emmanuel Grumbach
Details
trace (413.98 KB, application/pgp-encrypted)
2015-11-19 08:28 UTC, Vinilox
Details
iwl.dump (75.78 KB, application/pgp-encrypted)
2015-11-19 08:28 UTC, Vinilox
Details
dmesg (59.27 KB, application/octet-stream)
2015-11-19 08:30 UTC, Vinilox
Details
Core14 FW with uSniffer with TX_MNG probes (1.00 MB, application/octet-stream)
2015-11-19 09:29 UTC, Emmanuel Grumbach
Details
dmesg (59.54 KB, application/octet-stream)
2015-11-19 09:40 UTC, Vinilox
Details
trace (409.68 KB, application/pgp-encrypted)
2015-11-19 09:41 UTC, Vinilox
Details
iwl.dump (793.57 KB, application/pgp-encrypted)
2015-11-19 09:43 UTC, Vinilox
Details
iwl.dump 11n_disable (967.33 KB, application/pgp-encrypted)
2015-11-19 12:33 UTC, Vinilox
Details
iwl.dump quiet environment (938.58 KB, application/pgp-encrypted)
2015-11-23 08:02 UTC, Vinilox
Details
trace.dat quiet environment (1.16 MB, application/pgp-encrypted)
2015-11-23 08:02 UTC, Vinilox
Details
dmesg quiet environment (57.25 KB, application/octet-stream)
2015-11-23 08:05 UTC, Vinilox
Details
iwl.dump cfg80211_disable_40mhz_24ghz=true (780.15 KB, application/pgp-encrypted)
2015-11-25 10:21 UTC, Vinilox
Details
trace.dat cfg80211_disable_40mhz_24ghz=true (654.30 KB, application/pgp-encrypted)
2015-11-25 10:21 UTC, Vinilox
Details
dmesg cfg80211_disable_40mhz_24ghz=true (59.70 KB, application/octet-stream)
2015-11-25 10:24 UTC, Vinilox
Details
dmesg router set to 20MHz (51.15 KB, application/octet-stream)
2015-11-25 10:40 UTC, Vinilox
Details
iwl.dump router set to 20MHz (1.00 MB, application/pgp-encrypted)
2015-11-25 10:45 UTC, Vinilox
Details
scan router set to 20MHz (3.61 KB, application/pgp-encrypted)
2015-11-25 10:51 UTC, Vinilox
Details
router parameters (46.38 KB, image/png)
2015-11-25 10:53 UTC, Vinilox
Details
iwl.dump cfg80211_disable_40mhz_24ghz=Y (1021.07 KB, application/pgp-encrypted)
2015-11-25 11:09 UTC, Vinilox
Details
dmesg cfg80211_disable_40mhz_24ghz=Y (51.14 KB, application/octet-stream)
2015-11-25 11:12 UTC, Vinilox
Details
dmesg with patched kernel (51.35 KB, application/octet-stream)
2015-12-09 13:04 UTC, Vinilox
Details
iwl.dump with patched kernel (1.04 MB, application/pgp-encrypted)
2015-12-09 13:05 UTC, Vinilox
Details
dmesg power_scheme=1 (58.34 KB, application/octet-stream)
2016-01-22 16:43 UTC, Vinilox
Details
trace.dat power_scheme=1 (407.93 KB, application/pgp-encrypted)
2016-01-22 16:43 UTC, Vinilox
Details
iwl.dump power_scheme=1 (739.75 KB, application/pgp-encrypted)
2016-01-22 16:44 UTC, Vinilox
Details

Description ailin.nemui+kernel 2015-08-26 03:58:36 UTC
Hello, I have a problem with my Dell XPS 12. The wireless connection just stops working. NetworkManager still shows me as connected but a ping to the router gives Destination Host Unreachable. Disabling and enabling the Wireless restores the connection for a little while. Sometimes this issue happens very frequently (every few minutes), sometimes I can stay connected for a whole evening.

It seems weird, but in my perception, "seeking" on Youtube  videos makes this issue appear more quickly.

It sounds similar to bug 93431 which is supposed to be fixed.


I installed iwlwifi from backport-iwlwifi version 4fd69991

Kernel OpenSUSE 4.1.6-2.gce0123d-desktop

firmware-version: 15.209721.0 from iwlwifi/linux-firmware

Issue has been present with the original drivers that  came with Suse as well.


Does this sound like a firmware issue or broken hardware?

I attached a trace-cmd result when this situation happens. A parallel running ping command displays like

64 bytes from 192.168.1.1: icmp_seq=979 ttl=64 time=34.5 ms
64 bytes from 192.168.1.1: icmp_seq=980 ttl=64 time=31.5 ms
64 bytes from 192.168.1.1: icmp_seq=981 ttl=64 time=53.2 ms
64 bytes from 192.168.1.1: icmp_seq=982 ttl=64 time=13.4 ms
64 bytes from 192.168.1.1: icmp_seq=983 ttl=64 time=1.27 ms
64 bytes from 192.168.1.1: icmp_seq=984 ttl=64 time=29.7 ms
From 192.168.1.104 icmp_seq=1014 Destination Host Unreachable
From 192.168.1.104 icmp_seq=1015 Destination Host Unreachable
From 192.168.1.104 icmp_seq=1016 Destination Host Unreachable
From 192.168.1.104 icmp_seq=1017 Destination Host Unreachable


No entry is visible in dmesg
Comment 1 ailin.nemui+kernel 2015-08-26 04:06:40 UTC
(In reply to ailin.nemui+kernel from comment #0)
> Hello, I have a problem with my Dell XPS 12. The wireless connection just
> stops working. NetworkManager still shows me as connected but a ping to the
> router gives Destination Host Unreachable. Disabling and enabling the
> Wireless restores the connection for a little while. Sometimes this issue
> happens very frequently (every few minutes), sometimes I can stay connected
> for a whole evening.
> 
> It seems weird, but in my perception, "seeking" on Youtube  videos makes
> this issue appear more quickly.
> 
> It sounds similar to bug 93431 which is supposed to be fixed.
> 
> 
> I installed iwlwifi from backport-iwlwifi version 4fd69991
> 
> Kernel OpenSUSE 4.1.6-2.gce0123d-desktop
> 
> firmware-version: 15.209721.0 from iwlwifi/linux-firmware
> 
> Issue has been present with the original drivers that  came with Suse as
> well.
> 
> 
> Does this sound like a firmware issue or broken hardware?
> 
> I attached a trace-cmd result when this situation happens. A parallel
> running ping command displays like
> 
> 64 bytes from 192.168.1.1: icmp_seq=979 ttl=64 time=34.5 ms
> 64 bytes from 192.168.1.1: icmp_seq=980 ttl=64 time=31.5 ms
> 64 bytes from 192.168.1.1: icmp_seq=981 ttl=64 time=53.2 ms
> 64 bytes from 192.168.1.1: icmp_seq=982 ttl=64 time=13.4 ms
> 64 bytes from 192.168.1.1: icmp_seq=983 ttl=64 time=1.27 ms
> 64 bytes from 192.168.1.1: icmp_seq=984 ttl=64 time=29.7 ms
> From 192.168.1.104 icmp_seq=1014 Destination Host Unreachable
> From 192.168.1.104 icmp_seq=1015 Destination Host Unreachable
> From 192.168.1.104 icmp_seq=1016 Destination Host Unreachable
> From 192.168.1.104 icmp_seq=1017 Destination Host Unreachable
> 
> 
> No entry is visible in dmesg

I didn't manage to time the recording properly so it became unnecessarily large (too large for bugzilla). Maybe you can still  use it even if you have to download it from a 3rd party site? Otherwise I'll have to try some more times...

Download site -> http://s000.tinyupload.com/index.php?file_id=45896859822207246811
Comment 2 Emmanuel Grumbach 2015-08-26 04:22:57 UTC
What does ethtool -i wlan0 say?

Please provide dmesg output.
Comment 3 Emmanuel Grumbach 2015-08-26 04:24:27 UTC
Ah you have a trace recording. I'll look at it.
Please provide the ethtool output though.
Comment 4 ailin.nemui+kernel 2015-08-26 04:26:08 UTC
Thanks for your effort! 

ethtool -i wlp4s0 
driver: iwlwifi
version: 4.1.6-2.gce0123d-desktop
firmware-version: 15.209721.0
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
Comment 5 ailin.nemui+kernel 2015-08-26 04:26:42 UTC
Created attachment 185881 [details]
dmesg output attached
Comment 6 Emmanuel Grumbach 2015-08-26 05:32:54 UTC
Do you have another machine that you could put in monitor mode to sniff the traffic?
I could fetch the trace-cmd file, but I can't see any problem there.
Did you stop it when you had issues?
Comment 7 ailin.nemui+kernel 2015-08-27 22:37:02 UTC
I hit ^C when I saw the ping failing, so near the end of that trace I could not use the Internet anymore.

I got some computer for traffic monitoring (Intel 3160). Currently the problematic computer has been stable for a day, and since I don't have any real clue yet how to *force* this problem to appear I am sort of waiting for it to occur again.

I'm not familiar with traffic sniffing: Is it appropriate to do the traffic sniffing in monitor mode and seeing the encrypted packages? I tried to use wireshark with an established connection and promiscuous mode but I cannot observe any packages that aren't my own.
Comment 8 Luca Coelho 2015-08-31 06:46:10 UTC
Usually the firmware of the wifi cards filter out most of the packets that are not interesting (for instance, the packets that are not for your device) among other things, so you can't really get good results when sniffing on the same device you're trying to debug (or any device that is connected, for that matter).

You should have a separate device and set it to monitor mode and move to the correct channel:

iw <interface_name> set mode monitor
iw <interface_name> set channel <channel_of_your_AP>

Then you can start sniffing with wireshark.

Meanwhile, I'll try to take a look at the trace-cmd logs you provided.
Comment 9 Luca Coelho 2015-08-31 06:53:56 UTC
Hmmm... I can de-gpg the trace file since it's encrypted with Emmanuel's key.  He is on extended vacations now, so I can't ask him to open it up.

Can you upload it again, maybe using my PGP key?

http://pgp.mit.edu/pks/lookup?op=vindex&search=0xA1479CA21A3CC5FA

(There you can see that I also work for Intel, though I'm logged in bugzilla with my private account).
Comment 10 Emmanuel Grumbach 2015-09-16 06:36:11 UTC
I re-encrypted the file with your key Luca:

https://www.dropbox.com/s/1lj5ki13u4pnscx/trace2.dat.7z.gpg?dl=0

Seems like we need a key for the group :)
Comment 11 Luca Coelho 2015-09-16 06:41:12 UTC
Thanks, Emmanuel!

Yeah, we should have a private key for our group.  I'll create one, then we can discuss how to securely share it among us later on.
Comment 12 ailin.nemui+kernel 2015-09-22 00:10:38 UTC
hi, sorry for the late response. Did you manage to see anything in this trace, since Emmanuel did not?

I sent the Computer to Dell for repair and they (said they) exchanged the WLAN card, however the problem persists. In you judgement, do you think this seems like a potential hardware issue and if so would you have any guess as to which part to look at?

I'm still struggling with the recording, I tried it with a Ralink USB on the same computer but I can hardly see any packages in wireshark. I also feel like NetworkManager is interfering with the monitor mode, do you have any further guidance? Anyway I will keep trying.

Thanks,
Comment 13 ailin.nemui+kernel 2015-09-22 00:19:51 UTC
Created attachment 188051 [details]
wireshark capture of iwlwifi card itself

I attach a wireshark recording done directly on the intel card, started after the connection broke down. I have no idea if that is anything useful (probably not) it has the .201 IP. In the middle of the recording, I plugged in the .109 which is the Ralink USB WLAN card
Comment 14 ailin.nemui+kernel 2015-09-22 00:23:11 UTC
By the way I am under the impression that this problem does not happen with 11n_disable, but I need to investigate this further
Comment 15 ailin.nemui+kernel 2015-10-01 18:47:17 UTC
Today I spent one evening trying to reproduce this bug. Fruitless! Dowloaded several GB without any issue. Aggravating. At least I think I understand how to monitor the traffic now, need to disable NetworkManager.
Comment 16 ailin.nemui+kernel 2015-10-02 18:39:45 UTC
Created attachment 189341 [details]
this capture done from 2nd device: connectivity is lost -> connectivity restored by hard reset

I attach this traffic capture, it starts at a time where the connectivity has already been severed (network is not working anymore), until a hard reset using rfkill, after which network resumes operation.

I'm still trying to get a capture which shows working -> broken transition.

Wireless was turned off at frame 12499 (deauthentication), reset the wireless with rfkill; then started wget (now working again)

mac of broken device is ...:11:7e:cf
Comment 17 ailin.nemui+kernel 2015-10-03 12:08:13 UTC
Created attachment 189371 [details]
capture from 2nd device: connectivity works -> connectivity broken

here's the trace where a wget download was running and then stalled
Comment 18 ailin.nemui+kernel 2015-10-03 12:18:17 UTC
I feel like it is easier to reproduce when the link speed is at 140Mb or so
Comment 19 Luca Coelho 2015-10-05 07:52:00 UTC
Thanks for the logs and persistence in trying to reproduce! :)

I have reported an internal bug to our firmware team to try to see what is going on.  We will keep you updated.
Comment 20 Luca Coelho 2015-11-16 08:46:24 UTC
Created attachment 193181 [details]
Core14 FW with uSniffer

Our firmware team is asking for more detailed firmware logs.  Can you please reproduce with the firmware attached?

Please capture trace-cmd logs (as before) and, as soon as possible after the problem occurs, capture the firmware logs as explained here:

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging

Please make sure you read the notice about privacy.
Comment 21 Vinilox 2015-11-17 21:03:07 UTC
Hello,

It seems I have the same issue. The symptoms are the same, I am connected but I don't have network access, sometimes it comes back after ~30-60s but most of times I need to enable/disable airplane mode.

Here is the dmesg http://paste.vinilox.eu/?e3decaea10a483fb#ir8oUbujuK9yADH4PUvBpqPq12VRdhVJ4BGDJOkEF4s=

What can I do to help if this is the same issue ?
Thanks for your help !
Comment 22 Vinilox 2015-11-17 21:04:51 UTC
Sorry forgot to add :
Archlinux everything up to date
Kernel 4.3
Firmware 17.228510.0

I tried every firmware from -14 to -17.
Comment 23 Emmanuel Grumbach 2015-11-18 06:43:35 UTC
@Vinilox,

You seem to be suffering from the same problem indeed.
Please  follow the steps described in comment 20 and provide the debug data.

thanks.
Comment 24 Vinilox 2015-11-18 13:12:14 UTC
Created attachment 194841 [details]
trace.dat encrypted vinilox
Comment 25 Vinilox 2015-11-18 13:23:37 UTC
Created attachment 194851 [details]
Dmesg with CONFIG_IWLWIFI_DEBUG=y

Here is a new dmesg with CONFIG_IWLWIFI_DEBUG=y.

I can't provide a FW debug log because I don't have the /sys/kernel/debug/iwlwifi/ directory. Sorry for that. My kernel config is here http://paste.vinilox.eu/?e163dee924c77cd7#b+nDJygUYlYowU1HcC7IR1Hly/YmRgD2/Y0gHKYW/e0= sorry if I forgot to set a config option.

If I set 11n_disable=1, this bug isn't present.

Thanks for your help.
Comment 26 Emmanuel Grumbach 2015-11-18 13:24:42 UTC
You are missing IWLWIFI_DEBUGFS. Ok, thanks. We have a few other reports saying that 11n_disable=1 makes the bug disappear.
Comment 27 Vinilox 2015-11-18 13:29:10 UTC
That was fast ! Thank you, recompiling right now and sending you the log ASAP.
Comment 28 Vinilox 2015-11-18 14:18:38 UTC
Created attachment 194861 [details]
FW dump

Here is the fw dump. I hope it will be useful. If it's not tell me and I will retry.
Thank you.
Comment 29 Emmanuel Grumbach 2015-11-18 19:44:48 UTC
How did you create this fw dump?
I don't see that you had the FW crash because of stuck queue here?
Actually, did you experience any WiFi malfunction when you created this fw dump?
Comment 30 Vinilox 2015-11-18 19:57:23 UTC
I did experience a wifi malfunction and just after I saw it in the dmesg I ran cat /sys/devices/virtual/devcoredump/devcd1/data > iwl.dump.
I will try again.
Comment 31 Emmanuel Grumbach 2015-11-18 19:59:23 UTC
Please attach the fw dump / dmesg of the same occurrence.
I suggest to run with fw_restart=0. This will prevent firmware reload and then you are sure that you have mismatches.
Of course, WiFi will not be functional until you reload the modules.
Comment 32 Vinilox 2015-11-18 20:05:20 UTC
Ok. Trying right now with fw_restart=0.
Comment 33 Vinilox 2015-11-18 20:13:00 UTC
Created attachment 194871 [details]
fwdump

Here is the new coredump, I launched cat /sys/devices/virtual/devcoredump/devcd1/data > fwdump after the no connectivity problem.
dmesg for this one http://paste.vinilox.eu/?512c1241a8c8757c#NdBH1FChBY/Gjr9cRWGgxudSGsUv9vcZ9FxYbeUdLYs=
I hope it will be helpful.
Comment 34 Emmanuel Grumbach 2015-11-18 20:15:30 UTC
Pasting here the relevant info from dmesg:

[   46.131973] iwlwifi 0000:01:00.0: Queue 16 stuck for 10000 ms.
[   46.131995] iwlwifi 0000:01:00.0: Current SW read_ptr 251 write_ptr 48
[   46.132064] iwl data: 00000000: ff 07 00 00 00 00 00 00 00 00 00 f8 00 00 00 00  ................
[   46.132094] iwlwifi 0000:01:00.0: FH TRBs(0) = 0x00000000
[   46.132119] iwlwifi 0000:01:00.0: FH TRBs(1) = 0xc011000a
[   46.132142] iwlwifi 0000:01:00.0: FH TRBs(2) = 0x00000000
[   46.132165] iwlwifi 0000:01:00.0: FH TRBs(3) = 0x80300007
[   46.132188] iwlwifi 0000:01:00.0: FH TRBs(4) = 0x00000000
[   46.132210] iwlwifi 0000:01:00.0: FH TRBs(5) = 0x00000000
[   46.132234] iwlwifi 0000:01:00.0: FH TRBs(6) = 0x00000000
[   46.132256] iwlwifi 0000:01:00.0: FH TRBs(7) = 0x0070907d
[   46.132319] iwlwifi 0000:01:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0008 [8,8]
[   46.132381] iwlwifi 0000:01:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0008 [0,0]
[   46.132444] iwlwifi 0000:01:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0008 [198,198]
[   46.132506] iwlwifi 0000:01:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0008 [0,0]
[   46.132568] iwlwifi 0000:01:00.0: Q 4 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.132617] iwlwifi 0000:01:00.0: Q 5 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.132667] iwlwifi 0000:01:00.0: Q 6 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.132716] iwlwifi 0000:01:00.0: Q 7 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.132766] iwlwifi 0000:01:00.0: Q 8 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.132815] iwlwifi 0000:01:00.0: Q 9 is active and mapped to fifo 7 ra_tid 0x0000 [126,126]
[   46.132865] iwlwifi 0000:01:00.0: Q 10 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.132915] iwlwifi 0000:01:00.0: Q 11 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.132964] iwlwifi 0000:01:00.0: Q 12 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.133014] iwlwifi 0000:01:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.133063] iwlwifi 0000:01:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.133113] iwlwifi 0000:01:00.0: Q 15 is active and mapped to fifo 5 ra_tid 0x0008 [0,0]
[   46.133162] iwlwifi 0000:01:00.0: Q 16 is active and mapped to fifo 1 ra_tid 0x0000 [251,48]
[   46.133212] iwlwifi 0000:01:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.133262] iwlwifi 0000:01:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
[   46.133311] iwlwifi 0000:01:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
Comment 35 Vinilox 2015-11-18 20:17:05 UTC
Is the fw dump right this time ?
Comment 36 Emmanuel Grumbach 2015-11-18 20:28:44 UTC
This one is excellent thanks.
Comment 37 Emmanuel Grumbach 2015-11-18 20:29:10 UTC
I may have to get back to you after I'll talk to FW people (and that may take time)
Comment 38 Vinilox 2015-11-18 20:30:53 UTC
Great. Good luck and thank you !
No problem, I will do what I can to help.
Comment 39 Emmanuel Grumbach 2015-11-18 20:42:22 UTC
one last request (for now).
Can you please make sure that the bug still exists with the firmware from here:
https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7260-16.ucode

Make sure to delete -17.ucode before you test.
No need to create fw_dump this time.

Thank you.
Comment 40 Vinilox 2015-11-18 20:54:12 UTC
Created attachment 194881 [details]
dmesg

I can confirm that the bug is still here with the new ucode.
Comment 41 Emmanuel Grumbach 2015-11-19 07:34:52 UTC
Created attachment 194891 [details]
Core14 FW with uSniffer with TX_MNG probes

Hi Vilinox,

I talked with the FW folks. It seems to be power save related. Can you try to disable power save?
sudo iw wlan0 set power_save false.

Moreover, I am attaching a new firmware with more probes. Can you please reproduce with this firmware and please record tracing?
The goal here is to have the tracing output and fw dump of the same run.
Please run with fw_restart=0, that will make it easier.

Thanks for your cooperation!
Comment 42 Vinilox 2015-11-19 08:14:28 UTC
Trying this now.
And no problem thank you for your help !
Comment 43 Vinilox 2015-11-19 08:28:13 UTC
Created attachment 194911 [details]
trace
Comment 44 Vinilox 2015-11-19 08:28:51 UTC
Created attachment 194921 [details]
iwl.dump
Comment 45 Vinilox 2015-11-19 08:30:12 UTC
Created attachment 194931 [details]
dmesg

And the dmesg. I hope it worked because it crashed just after I set power_save to off, if not tell me and I will retry.
Comment 46 Emmanuel Grumbach 2015-11-19 08:56:22 UTC
argh wait.
Sorry... I forgot to enable debug on the firmware...
I'll send a new firmware...
Comment 47 Emmanuel Grumbach 2015-11-19 09:29:31 UTC
Created attachment 194941 [details]
Core14 FW with uSniffer with TX_MNG probes

Ok - here we go.
Please start tracing as soon as you load the driver.
Then, let it go until you hit the bug with fw_restart=0.

Thanks!
Comment 48 Vinilox 2015-11-19 09:32:04 UTC
Ok. Doing this now.
Comment 49 Vinilox 2015-11-19 09:40:53 UTC
Created attachment 194951 [details]
dmesg
Comment 50 Vinilox 2015-11-19 09:41:22 UTC
Created attachment 194961 [details]
trace
Comment 51 Vinilox 2015-11-19 09:43:07 UTC
Created attachment 194971 [details]
iwl.dump

It crashes as soon as I set power_save to off and if I understand the dmesg says that too.
Comment 52 Emmanuel Grumbach 2015-11-19 12:11:03 UTC
Ok thanks.
I looked at it along with the FW team. We seem to be hearing energy in the air all the time which prevents us to transmit.
I wonder why this doesn't happen when 11n_disable is true.
Can you please record a trace with 11n_disable?
Since you won't have FW crash this time, you can use the manual trigger:

echo 1 > /sys/kernel/debug/iwlwifi/0000\:0X\:00.0/iwlmvm/fw_dbg_collect

Thanks!
Comment 53 Vinilox 2015-11-19 12:33:24 UTC
Created attachment 194981 [details]
iwl.dump 11n_disable

I recorded the trace during ~5min while wget an ubuntu iso. The trace is big and I can't attach it here, you can download it there http://cloud.vinilox.eu/index.php/s/9BrUPZwaUWDXiie
When I don't have the fw crash (with wifi N), the connection is in average slowly than with wifi G. It will have faster maximum but overall it will be very variable and slower. Here I am in a city on a building with a LOT of N access points but I also tried in a house with no wifi at all except my N AP and the connexion is still very slow (I don't remember if I had fw crash). I hope this can help.
Tell me if you need something else.
Comment 54 Emmanuel Grumbach 2015-11-19 18:38:12 UTC
So all the captures you sent until now are in a very busy environment?
If the bug also reproduces in a quiet environment, I'll appreciate if you could create a dump there. No need to disable power save.

Many thanks.
Comment 55 Vinilox 2015-11-19 19:14:25 UTC
Yes it's in a very busy environment.
I will do this on sunday, I can't go there before sorry. Only dump or dump + trace ?
No problem, thank you.
Comment 56 Emmanuel Grumbach 2015-11-19 19:16:32 UTC
No problem. It is kinda week end anyway :)
Dump + dmesg. Thanks.
Comment 57 Vinilox 2015-11-19 19:17:41 UTC
Ahah true :D
Ok, will do.
Have a nice week end !
Comment 58 Emmanuel Grumbach 2015-11-23 07:24:45 UTC
Any news for me?
Comment 59 Vinilox 2015-11-23 08:02:05 UTC
Created attachment 195191 [details]
iwl.dump quiet environment
Comment 60 Vinilox 2015-11-23 08:02:41 UTC
Created attachment 195201 [details]
trace.dat quiet environment
Comment 61 Vinilox 2015-11-23 08:05:06 UTC
Created attachment 195211 [details]
dmesg quiet environment

I couldn't reproduce a crash but I got a lot of connection drop/reconnect. I hope it can help.
Have a nice week.
Comment 62 Emmanuel Grumbach 2015-11-23 08:09:48 UTC
Excellent, I will take a look today and forward this to the firmware team.

Many thanks for your help.
Comment 63 Vinilox 2015-11-23 08:12:49 UTC
Thank you.
You're welcome, glad I could help. Tell me if you need something else.
Comment 64 Emmanuel Grumbach 2015-11-25 10:12:02 UTC
can you try to set cfg80211_disable_40mhz_24ghz to true?
This is module parameter to the cfg80211 module.

THanks.
Comment 65 Vinilox 2015-11-25 10:14:03 UTC
Trying now.
Comment 66 Vinilox 2015-11-25 10:21:17 UTC
Created attachment 195381 [details]
iwl.dump cfg80211_disable_40mhz_24ghz=true
Comment 67 Vinilox 2015-11-25 10:21:35 UTC
Created attachment 195391 [details]
trace.dat cfg80211_disable_40mhz_24ghz=true
Comment 68 Vinilox 2015-11-25 10:24:53 UTC
Created attachment 195401 [details]
dmesg cfg80211_disable_40mhz_24ghz=true

With 
options iwlwifi fw_restart=0
option cfg80211 cfg80211_disable_40mhz_24ghz=true

Still the crash. My router does use 40MHz but I can also set it to 20 MHz if that can help.
Comment 69 Emmanuel Grumbach 2015-11-25 10:30:13 UTC
(In reply to Vinilox from comment #68)

> Still the crash. My router does use 40MHz but I can also set it to 20 MHz if
> that can help.

yes please.
Comment 70 Vinilox 2015-11-25 10:40:28 UTC
Created attachment 195411 [details]
dmesg router set to 20MHz

It seems like this is the problem. I set my router to 20MHz and I was able to download an entire ubuntu iso without firmware crash or deauthentification. The dmesg looks like the wifi G one.
And the speed is around 4Mo/s with max around 5Mo/s and it's STABLE (never got a stable speed with wifi N before).
Do you want the trace.dat and a manual dump ?
Comment 71 Emmanuel Grumbach 2015-11-25 10:42:54 UTC
yes please.

Do you have other WiFi devices that operate properly when the bandwidth is set to 40Mhz?
Comment 72 Vinilox 2015-11-25 10:45:18 UTC
Created attachment 195421 [details]
iwl.dump router set to 20MHz
Comment 73 Emmanuel Grumbach 2015-11-25 10:48:34 UTC
Thanks.
We are making progress.

One more thing.

Can you please send the output of your scan?

This will contain information that you may want to encrypt as well.

sudo iw wlan0 scan

thanks.
Comment 74 Vinilox 2015-11-25 10:49:05 UTC
Here is the trace.dat https://cloud.vinilox.eu/index.php/s/bOfZuNTn7Ss0JZE file is too big I can't attach it here.
I have a Nexus 4, moto 360, Moto G (2nd gen), and 2 MacBook air 2014/2015. I also have this problem with my university wifi and every N wifi I tried.
Comment 75 Vinilox 2015-11-25 10:51:16 UTC
Created attachment 195431 [details]
scan router set to 20MHz
Comment 76 Vinilox 2015-11-25 10:53:01 UTC
Created attachment 195441 [details]
router parameters

Here are all the router parameters I can tweak, maybe it could help.
Great to see you are making progress. :)
Comment 77 Emmanuel Grumbach 2015-11-25 10:54:53 UTC
well... We know more, but this kind of issue is a really big pain for the firmware team. I can't promise anything about when we will have a fix, but at least we know in what area is the culprit.
Comment 78 Vinilox 2015-11-25 10:55:17 UTC
If this can help, I also have a chromecast (1st gen) which works on wifi N 40MHz.
I can't tell you if all the devices I listed are using the 40MHz sorry.
Comment 79 Vinilox 2015-11-25 10:56:43 UTC
I understand. But now that I know it works with disabling 40MHz or disabling wifi N it's better. Before that the wifi/laptop was unusable.
Anyway thank you and the firmware team and good luck !
Comment 80 Emmanuel Grumbach 2015-11-25 10:58:14 UTC
Your AP is on channel 9 or channel 13?

in 20Mhz you are now on Channel13, before that you seemed to be on Channel9?
Comment 81 Emmanuel Grumbach 2015-11-25 10:59:06 UTC
I am just asking if these devices perform better than Intel while the AP is configured in 40Mhz. I am not asking if they are actually using the 40Mhz spectrum, just asking if they operate properly when the AP is configured as it was.
Comment 82 Vinilox 2015-11-25 11:01:52 UTC
My AP is now on channel 13, previously (40MHz) it was using channel 9 (primary) and channel 13 (secondary) IIRC. It scans for the best channel.
Ok, so yes they all work properly.
BTW I just saw I messed up with cfg80211_disable_40mhz_24ghz=Y, so it was still with 40MHz sorry. Do you want me to do it again with AP 40MHz and laptop 20MHz ?
Comment 83 Emmanuel Grumbach 2015-11-25 11:03:37 UTC
(In reply to Vinilox from comment #82)
> BTW I just saw I messed up with cfg80211_disable_40mhz_24ghz=Y, so it was
> still with 40MHz sorry. Do you want me to do it again with AP 40MHz and
> laptop 20MHz ?

Yes please.
Comment 84 Vinilox 2015-11-25 11:09:32 UTC
Created attachment 195461 [details]
iwl.dump cfg80211_disable_40mhz_24ghz=Y
Comment 85 Vinilox 2015-11-25 11:12:19 UTC
Created attachment 195471 [details]
dmesg cfg80211_disable_40mhz_24ghz=Y

Trace.dat https://cloud.vinilox.eu/index.php/s/5L5pizH1BiJu3Ar

It was slower even within 1m to the AP → 2Mo/s vs 5Mo/s with AP set to 20MHz but it might be to the channel change (8 and 12 this time).
Comment 86 Vinilox 2015-11-25 11:13:22 UTC
I didn't experienced a firmware crash.
As always I was wget ubuntu.iso.
Comment 87 Vinilox 2015-11-25 12:27:38 UTC
I tested again with channels 9 and 13 and I don't have the speed issue mentioned in comment 85.
Comment 88 Emmanuel Grumbach 2015-11-25 12:30:42 UTC
Ok - so bottom line, moving to 20Mhz helps whether it is done from AP or from Client side. Correct?
Comment 89 Vinilox 2015-11-25 12:39:13 UTC
Sorry for the noise.

After more testing :
- I don't have firmware crash whether it's done from AP or from client.
- But if done from client, the speed is less consistent and slower.
Comment 90 Emmanuel Grumbach 2015-12-06 12:26:11 UTC
Hi again,

We are trying to define a solution for your case. In order to define a suitable solution from the system point of view, we'd need you to test with the following hack. If it helps, we will have a direction for a better fix:

diff --git a/drivers/net/wireless/iwlwifi/mvm/rs.c b/drivers/net/wireless/iwlwifi/mvm/rs.c
index e23a579..f6fa02d 100644
--- a/drivers/net/wireless/iwlwifi/mvm/rs.c
+++ b/drivers/net/wireless/iwlwifi/mvm/rs.c
@@ -1519,10 +1519,10 @@ static s32 rs_get_best_rate(struct iwl_mvm *mvm,

 static u32 rs_bw_from_sta_bw(struct ieee80211_sta *sta)
 {
-       if (sta->bandwidth >= IEEE80211_STA_RX_BW_80)
-               return RATE_MCS_CHAN_WIDTH_80;
-       else if (sta->bandwidth >= IEEE80211_STA_RX_BW_40)
-               return RATE_MCS_CHAN_WIDTH_40;
+//     if (sta->bandwidth >= IEEE80211_STA_RX_BW_80)
+//             return RATE_MCS_CHAN_WIDTH_80;
+//     else if (sta->bandwidth >= IEEE80211_STA_RX_BW_40)
+//             return RATE_MCS_CHAN_WIDTH_40;

        return RATE_MCS_CHAN_WIDTH_20;
 }


thanks.
Comment 91 Emmanuel Grumbach 2015-12-06 12:26:42 UTC
of course, we'd need you to test this with 40Mhz on both ends.
Comment 92 Vinilox 2015-12-09 13:04:52 UTC
Created attachment 196961 [details]
dmesg with patched kernel
Comment 93 Vinilox 2015-12-09 13:05:25 UTC
Created attachment 196971 [details]
iwl.dump with patched kernel
Comment 94 Emmanuel Grumbach 2015-12-09 13:06:52 UTC
does it feel better?
Comment 95 Vinilox 2015-12-09 13:09:30 UTC
Sorry for the answer delay.
With the 4 lines commented and 40MHz on both ends, I have no fw crash. I successfully downloaded 2 ubuntu.iso but I noticed that in 10 minutes I got 2-3 time wget hang, it was stuck, dmesg showed nothing and a ping to google appears to be hang as well and 20-30s later, everything was back to normal. Speed was consistent around 5MB/s (except during the hangs it was showing --- B/s) with maximum at ~7MB/s.
Trace.dat here https://cloud.vinilox.eu/index.php/s/JDtJD7yAx8lnLzI

It definitely feels better since I have no FW crash and I have just hangs so download/transfer was not corrupted. I can use this daily. Thanks ! :)
Comment 96 Emmanuel Grumbach 2015-12-09 13:15:37 UTC
wait - this is only the start...

We wanted you to run this to define a proper solution in the long run.
We had originally made lab experiments on this but they weren't conclusive. Your real life experiments are very valuable to us.

This gives us a hint as of which system direction we want to take.
Now that we know that this helps, we need to properly define the solution and implement it (in firmware most probably). I could manage to have the system and firmware team involved and I really hope that we are now in the direction of a solution.

I have no words to express my gratitude to your patience and your priceless input.
Comment 97 Vinilox 2015-12-09 13:22:45 UTC
I understand. If you need my help, just ask me and I will try to help.
I am glad this will be solved and even without a proper fix, I can now use my laptop (without wifi it was useless).
Big thumbs up and thanks to you, the system and firmware teams ! You are the ones to thank, not me. :)
Comment 98 Emmanuel Grumbach 2015-12-27 21:16:48 UTC
*** Bug 107471 has been marked as a duplicate of this bug. ***
Comment 99 Emmanuel Grumbach 2015-12-29 07:23:32 UTC
Hi Vinilox,

I had a discussion about this issue with the firmware team. They want to explore the PHY direction. Basically what happens is that we keep sensing energy on the extension channel but not on the control channel. This is why we can't transmit data in 40Mhz and this is also why if we limit ourselves to 20Mhz it helps (we don't check what happens on the whole 40MHz bandwidth).
From here, there are 2 possible directions:
1) Limit ourselves to 20Mhz in this case, but some firmware people aren't happy with this approach because it might cause other issues: if we hear energy on the extension channel, the AP should be hearing it as well and hence our transmission in 20Mhz will be botched anyway.
2) Assume that we *wrongly* sense energy on the extension channel and debug why we *think* there is energy when the air might be clean.

For now, we are going for direction number 2. This means that I am working with other people (PHY people) in the firmware team and they are asking more questions :)

We would like to know if, after we reset the firmware (because of the firmware crash), we can have *any* successful transmission in 40Mhz.
In order to know that, we need tracing that lasts long enough: before and after the crash. All you need to do is to have enough traffic running (ping -i 0.2 is enough) so that the driver we try to raise its transmission rate after the crash. This will allow to see if resetting the firmware at least allows transmission in 40Mhz.
If that was to be the case, it'd mean that there is a bug in the firmware / PHY that can be fixed or at least workarounded.
If not, it really means that the environment you are working in is such that we will not ever be able to transmit in 40Mhz and we'll need to check for direction number 1 again.
Comment 100 Emmanuel Grumbach 2015-12-31 19:55:42 UTC
Hi all,

we made some progress with another user not on bugzilla.
I am talking about the FW crash only (not high ping latency). Can you please run with power_scheme=1 as a module parameter to iwlmvm?

sudo echo "options iwlmvm power_scheme=1" >> /etc/modprobe.d/iwlwifi.conf

reboot.


thanks!
Comment 101 Ansa89 2016-01-03 14:09:05 UTC
Apparently with "power_scheme=1" the firmware doesn't crash (using the debug firmware you provided me in my original bug report).
Further tests will follow.

My original bug report is https://bugzilla.kernel.org/show_bug.cgi?id=107471.
Comment 102 Emmanuel Grumbach 2016-01-03 14:25:15 UTC
(In reply to Ansa89 from comment #101)
> Apparently with "power_scheme=1" the firmware doesn't crash (using the debug
> firmware you provided me in my original bug report).
> Further tests will follow.
> 
> My original bug report is https://bugzilla.kernel.org/show_bug.cgi?id=107471.

Ok - so, with power_scheme=1 it doesn't crash, but with power_scheme=2 (which is the default) and iw wlan0 set power_save off, it does crash.  Right?
Comment 103 Ansa89 2016-01-03 15:55:21 UTC
|--------------|------------|------------|
| power_scheme | power_save |   RESULT   |
|--------------|------------|------------|
| 2 (default)  | off        | CRASH      |
|--------------|------------|------------|
| 2 (default)  | on         | CRASH      |
|--------------|------------|------------|
| 1            | off        | OK         |
|--------------|------------|------------|
| 1            | on         | NOT TESTED |
|--------------|------------|------------|
Comment 104 Emmanuel Grumbach 2016-01-03 16:15:18 UTC
Couldn't be clearer :)
Comment 105 Emmanuel Grumbach 2016-01-04 08:29:46 UTC
can you please remove power_scheme=1 and do the following (as root):

echo lprx=0 > /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/iwlmvm/pm_params

This will disable a power save feature that can be causing the troubles.

thanks.
Comment 106 Ansa89 2016-01-04 18:17:38 UTC
Took me a while, but in the end it crashed.
I have collected a dump just in case you need it.

Also when collecting the dump I got this oops in dmesg:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 3601 at drivers/net/wireless/iwlwifi/mvm/fw.c:824 iwl_mvm_fw_dbg_collect_desc+0xab/0xc0 [iwlmvm]()
Modules linked in: iwlmvm iwlwifi uas usb_storage hid_generic mac80211 cfg80211 ipv6 rfcomm bnep lp ppdev parport_pc parport fuse snd_hda_codec_hdmi i2c_dev hid_multitouch uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core usbhid v4l2_common hid videodev btusb btrtl btbcm btintel bluetooth joydev snd_hda_codec_realtek snd_hda_codec_generic i915 drm_kms_helper drm snd_hda_intel snd_hda_codec intel_gtt snd_hda_core psmouse agpgart snd_hwdep i2c_algo_bit xhci_pci fb_sys_fops syscopyarea xhci_hcd sysfillrect snd_pcm sysimgblt snd_timer mei_me i2c_i801 snd mei soundcore ideapad_laptop ehci_pci ehci_hcd efivars sparse_keymap i2c_core lpc_ich wmi rfkill serio_raw evdev soc_button_array loop [last unloaded: iwlwifi]
CPU: 2 PID: 3601 Comm: bash Tainted: G        W       4.3.3 #1
Hardware name: LENOVO 20268/Cherry 3A Touch, BIOS 7CCN62WW 11/13/2014
 ffffffffc04e0838 ffff880201213d60 ffffffff815df9c8 0000000000000000
 ffff880201213d98 ffffffff81086608 ffff88006dec1428 0000000000000000
 ffff88006dc84290 0000000000000000 0000000000000001 ffff880201213da8
Call Trace:
 [<ffffffff815df9c8>] dump_stack+0x44/0x5c
 [<ffffffff81086608>] warn_slowpath_common+0x88/0xc0
 [<ffffffff810866fa>] warn_slowpath_null+0x1a/0x20
 [<ffffffffc04ac3ab>] iwl_mvm_fw_dbg_collect_desc+0xab/0xc0 [iwlmvm]
 [<ffffffffc04ac424>] iwl_mvm_fw_dbg_collect+0x64/0x80 [iwlmvm]
 [<ffffffffc04cfd5a>] _iwl_dbgfs_fw_dbg_collect_write+0x7a/0xb0 [iwlmvm]
 [<ffffffff811a7488>] __vfs_write+0x28/0xf0
 [<ffffffff81193471>] ? kmem_cache_alloc+0xe1/0x200
 [<ffffffff811b5e9f>] ? getname_flags+0x4f/0x1f0
 [<ffffffff810c1691>] ? percpu_down_read+0x21/0x50
 [<ffffffff811a7b49>] vfs_write+0xa9/0x190
 [<ffffffff811a8856>] SyS_write+0x46/0xa0
 [<ffffffff81c3b21b>] entry_SYSCALL_64_fastpath+0x16/0x6e
---[ end trace 4f15751d1323a623 ]---
iwlwifi 0000:02:00.0: Collecting data: trigger 1 fired.
iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
Comment 107 Emmanuel Grumbach 2016-01-04 18:25:47 UTC
So it works better than the normal configuration?
Please send me the dump. Thanks
Comment 108 Ansa89 2016-01-04 18:48:41 UTC
Not really: after the initial crash, following crashes happen often (maybe the firmware reinitialization overwrites "lprx=0" parameter?).
Moreover I have to say I'm not in the same environment hence I don't know if this played an important role.

Dumps sent by email.
Comment 109 Emmanuel Grumbach 2016-01-04 20:25:55 UTC
I'll check the persistence of the configuration tomorrow.
Can you please send the whole dmesg output about the oops? Thanks.
Comment 110 Ansa89 2016-01-04 22:30:04 UTC
Dmesg sent by email.
Comment 111 Emmanuel Grumbach 2016-01-05 07:50:46 UTC
Thanks for the dmesg. I think I found the problem and fixed it in our internal tree.

We are analyzing the logs you sent, but it seems that the issue you are seeing with lprx=0 is slightly different although the effect is the same.
There is another user that reported that disabling LPRX helped.
Comment 112 Ansa89 2016-01-05 11:27:33 UTC
Today I will try to do some tests in the original environment (if I get some spare time).
Comment 113 Emmanuel Grumbach 2016-01-05 13:41:54 UTC
I checked the persistence of the lprx over firmware crash: it is persistent.
Comment 114 Emmanuel Grumbach 2016-01-06 08:14:03 UTC
@Ansa89:

I analyzed the logs with the firmware team experts. What is happening is that the AP is allocating the air for a very long time and we can't get a chance to send our own traffic. When we do get a chance to transmit data, we can't get ACK from the AP.

This needs to be further debugged, but this is not the same issue as the original one.
Comment 115 Vinilox 2016-01-06 08:18:47 UTC
Hello,

Still here and following this issue, I didn't had time to test sorry. I will be able to send dmesg/crash/fw dump by the end of the week.
Comment 116 Ansa89 2016-01-10 18:29:47 UTC
(In reply to Emmanuel Grumbach from comment #105)
> can you please remove power_scheme=1 and do the following (as root):
> 
> echo lprx=0 > /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/iwlmvm/pm_params
> 
> This will disable a power save feature that can be causing the troubles.
> 
> thanks.

Tested in the first environment: no crashes, but the connection speed is very unstable; using wget to download a single file I got speeds between 500 KB/s and 2 KB/s (with an average speed of ~10 KB/s).
Moreover wget often shows "---" as current speed.



(In reply to Emmanuel Grumbach from comment #114)
> @Ansa89:
> 
> I analyzed the logs with the firmware team experts. What is happening is
> that the AP is allocating the air for a very long time and we can't get a
> chance to send our own traffic. When we do get a chance to transmit data, we
> can't get ACK from the AP.
> 
> This needs to be further debugged, but this is not the same issue as the
> original one.

Should I open a separated bug report?
Comment 117 Emmanuel Grumbach 2016-01-10 18:45:22 UTC
(In reply to Ansa89 from comment #116)
> (In reply to Emmanuel Grumbach from comment #105)
> > can you please remove power_scheme=1 and do the following (as root):
> > 
> > echo lprx=0 >
> /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/iwlmvm/pm_params
> > 
> > This will disable a power save feature that can be causing the troubles.
> > 
> > thanks.
> 
> Tested in the first environment: no crashes, but the connection speed is
> very unstable; using wget to download a single file I got speeds between 500
> KB/s and 2 KB/s (with an average speed of ~10 KB/s).
> Moreover wget often shows "---" as current speed.
> 

Ok - the environment is very busy to start with. I'd be glad to have a dump in this situation (same environment and lprx disabled). You can trigger a dump by doing:

echo 1 > /sys/kernel/debug/iwlwifi/0000\:0X\:00.0/iwlmvm/fw_dbg_collect 

> 
> 
> (In reply to Emmanuel Grumbach from comment #114)
> > @Ansa89:
> > 
> > I analyzed the logs with the firmware team experts. What is happening is
> > that the AP is allocating the air for a very long time and we can't get a
> > chance to send our own traffic. When we do get a chance to transmit data,
> we
> > can't get ACK from the AP.
> > 
> > This needs to be further debugged, but this is not the same issue as the
> > original one.
> 
> Should I open a separated bug report?

I don't think it is worth it at this stage. In that environment, can you please test this?

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h b/drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h
index 0036d18..0c6b38c 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h
@@ -189,7 +189,7 @@ enum iwl_tx_pm_timeouts {
  */
 #define IWL_DEFAULT_TX_RETRY                   15
 #define IWL_MGMT_DFAULT_RETRY_LIMIT            3
-#define IWL_RTS_DFAULT_RETRY_LIMIT             60
+#define IWL_RTS_DFAULT_RETRY_LIMIT             3
 #define IWL_BAR_DFAULT_RETRY_LIMIT             60
 #define IWL_LOW_RETRY_LIMIT                    7
Comment 118 Emmanuel Grumbach 2016-01-13 08:01:39 UTC
Vinilox? :)
Comment 119 Emmanuel Grumbach 2016-01-19 06:47:22 UTC
I know that we ask a lot from the submitters, but unfortunately we still need your cooperation to continue investigating this bug. When can we get the required data?
Comment 120 Vinilox 2016-01-22 16:43:05 UTC
Created attachment 200991 [details]
dmesg power_scheme=1
Comment 121 Vinilox 2016-01-22 16:43:51 UTC
Created attachment 201001 [details]
trace.dat power_scheme=1
Comment 122 Vinilox 2016-01-22 16:44:59 UTC
Created attachment 201011 [details]
iwl.dump power_scheme=1
Comment 123 Vinilox 2016-01-22 16:48:34 UTC
Really sorry for the BIG delay, I had a lot of IRL things to solve... :(
Now I am free to do any testing you want.

Here is a summary

|--------------|------------|------------|
| power_scheme | power_save |   RESULT   |
|--------------|------------|------------|
| 2 (default)  | off        | CRASH      |
|--------------|------------|------------|
| 2 (default)  | on         | CRASH      |
|--------------|------------|------------|
| 1            | off        | NOT TESTED |
|--------------|------------|------------|
| 1            | on         | CRASH      |
|--------------|------------|------------|

What should I test now ? I didn't knew if I had to test the same things as @Ansa89, is this the same issue ?
Just tell me what you want me to test and I will do it quickly.
Sorry again :(
Comment 124 Emmanuel Grumbach 2016-01-23 22:12:30 UTC
Thanks.

I'll analyze your dump to see if you faced the same issue as before or this is a new issue.
Comment 125 Emmanuel Grumbach 2016-01-24 08:04:50 UTC
I looked at your logs. I can see the original problem: we still can't get a chance to transmit.
This is somewhat disappointing because Ansa89 saw a different issue after disabling power save (Low Power Rx really) and it fixed the problem for a user using 8260.

Back to square one...
Comment 126 Vinilox 2016-01-24 12:18:14 UTC
Indeed it's disappointing. Tell me what you want me to test and I will.


If this helps :

- Laptop with a 7260-N single band (only 2,4GHz) http://ark.intel.com/products/75174/Intel-Wireless-N-7260, antennas under my wrists (one antenna left and one right) → using this here → crash with 40MHz

- Desktop (since yesterday) 7260-AC http://ark.intel.com/products/75439/Intel-Dual-Band-Wireless-AC-7260 with PCI-E adapter http://www.amazon.fr/gp/product/B016RU3T6S?psc=1&redirect=true&ref_=oh_aui_detailpage_o01_s00, 2x 6DB antennas → connected to the 2,4GHz wifi for testing → no problem with 40MHz → N (2,4GHz - 40MHz) and AC (5GHz - 80MHz) work perfectly.

Same AP with one SSID 2,4GHz and one SSID 5GHz.


If you want, I put the laptop card in the desktop to test. I can't do desktop (AC) → laptop because lenovo use a whitelist in their bios and the AC card is not allowed...
Comment 127 Emmanuel Grumbach 2016-01-24 12:26:04 UTC
This is really strange... I guess you can try to put the laptop card in the desktop, maybe the problem will be "solved" by the strong antennas.
Comment 128 Vinilox 2016-01-24 12:32:11 UTC
I will try this and tell you what happened.
Comment 129 Vinilox 2016-01-25 15:57:04 UTC
So, my laptop died, the motherboard is dead... I will try to reproduce laptop conditions with my PCI-E adapter and the laptop card + antennas.
Comment 130 Vinilox 2016-01-25 16:35:43 UTC
With the laptop card in the desktop :

- With 6dB antennas →  no crash (40MHz), much better speed than in the laptop, same speed as the 7260-AC in N mode.

- With laptop antennas → no crash, always connected to AP BUT sometimes network is unreachable and I have to toggle airplane mode on/off to make it work again, slightly better speed than in the laptop.

I have the impression that just by slightly pulling off the antenna cable (10cm), it improved everything → no crash (yet) and connection drops happens less often.

Maybe this issue was with this particular laptop antennas configuration → The antennas are under the wrists and the antennas cables are tightened against the battery.

If you want I can try again with very short antennas cables.

Sorry that I can't help more with my dead laptop. :(
Comment 131 Vinilox 2016-01-25 16:48:49 UTC
Just made another test with a short antenna cable surrounded by everything I could put → It take 3-4 (3 tries each) times to connect, and I have a lot of network unreachable but connected to AP all the time → I can't get it to crash...

Maybe the laptop was emitting some weird EM interferences ?
Comment 132 Emmanuel Grumbach 2016-01-25 17:42:11 UTC
I am sorry to hear about your laptop.
What is the laptop model?
It really looks like an antenna layout problem but I don't understand much about this.
I'll ask people who may know more.

Can you please record a dump with the setup you were describing in comment #131?

You can trigger the dump with the fw_dbg_collect debugfs hook in mvm.

Thanks.
Comment 133 Vinilox 2016-01-25 18:26:23 UTC
It was a lenovo yoga 2 13" (not pro).
Ok, if you need pictures or something else, tell me. I can take it apart.

I will do this tomorrow.
Comment 134 Emmanuel Grumbach 2016-01-27 08:48:39 UTC
Your latest tests seem to show that you suffered from an antenna issue. This can happen on certain systems and is beyond Intel's responsibility.

I will close this issue as will not fix.
Despite the fact that this bug will now be closed, I'll still receive notifications about any comment someone may add to this bug.
Comment 135 Emmanuel Grumbach 2016-01-27 08:56:52 UTC
Note that Intel's privacy policy forbids me to keep the data collected in this bug.

I am now deleting all the captures from our systems.
Comment 136 Linas Zvirblis 2016-02-15 21:02:29 UTC
I followed "duplicate" links from my bug report 107861 to this one, and I am quite puzzled as to what exactly is the solution here? I seem to have the exactly same problem as the original reporter, but on a Dell M3800 laptop, so it is not very likely to the exactly same antenna issue.

Note You need to log in before you can comment on or make changes to this bug.