Bug 93431

Summary: iwlmvm: 7260: intermittent connection (no data transferred) - MWG100230068
Product: Drivers Reporter: Stefan Soeffing (soeffing)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: enrico.tagliavini, hashem, ilw, linville, myspecialids-kernel, soeffing, xaviermiller
Priority: P1    
Hardware: Intel   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=95941
Kernel Version: 3.19 Subsystem:
Regression: No Bisected commit-id:
Attachments: sudo trace-cmd record -e iwlwifi -e mac80211 -e iwlwifi_msg
Core9 firmware with uSniffer
iwl.dump
sudo trace-cmd record -e iwlwifi -e mac80211 -e iwlwifi_msg
iwl.dump
dump
dump (with firmware tracing enabled)
dump (with firmware tracing enabled)
Core9 firmware with uSniffer
dmesg output
dmesg output NUC iwl 7265 kernel 3.19.3
dmesg output NUC iwl 7265 kernel 4.0
dmesg output - Tx queue problem?

Description Stefan Soeffing 2015-02-17 20:19:14 UTC
Dear all,

my Intel 7260 wifi is causing me troubles since quite some time now. Symptoms have been different in the past, from low data rates (became better when I installed a better antenna, though I'm no more than 4 meters away from the router) to frequent disconnects.

With the current firmware/driver combination, disconnects are much less frequent (~ once a day) but data transmission is quite instable. On a regular basis (~ every 5-10 minutes), data transmission is interrupted:
- WiFi is still connected
- no data transmission
- ping to router: packet loss, after some time I get the message
ping: sendmsg: No buffer space available
- sometimes the effect is less severe: ping times go up to ~10-20 secs, then back to normal (~5ms)
- usually recovers by itself after ~60 secs

Might be similar to Bug #93141

Things become better with
iw wlan0 set power_save off
=> transfer stalls less frequently, maybe every 30 minutes

Kernel 3.19
dmesg: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
dmesg: loaded firmware version 23.13.10.0 op_mode iwlmvm

See attached trace over a time period with normal transfer -> stalled transfer -> back to normal.

Thanks a lot. Anything else I can provide?
Comment 1 Stefan Soeffing 2015-02-17 20:20:48 UTC
Created attachment 167401 [details]
sudo trace-cmd record -e iwlwifi -e mac80211 -e iwlwifi_msg
Comment 2 Stefan Soeffing 2015-02-17 20:23:09 UTC
I forgot to mention that I tried the options 11n_disable=1 and wd_disable=1 , but got no improvement.
Comment 4 Stefan Soeffing 2015-02-19 18:37:32 UTC
Thanks Emmanuel,

Ucode identified itself as version 23.14.10.0.

Unfortunately, I get the same behavior (doesn't matter if power_save is enabled or disabled).
Comment 5 Emmanuel Grumbach 2015-02-19 19:43:50 UTC
Ok - please try this one:
https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7260-12.ucode

if that doesn't work - I guess I'll have to really look at the traces ;)
Comment 6 Emmanuel Grumbach 2015-02-19 20:14:10 UTC
can't see anything obvious from your trace.
Can you get me a sniffer capture?
Do you have another WiFi device you could put in monitor mode?
Comment 7 Stefan Soeffing 2015-02-20 08:10:02 UTC
I'll try that, but it will take a few days.

I haven't played around with sniffers for a while, so I'm not sure about the sniffer settings.

Is sniffing in monitor mode more helpful to you than promiscious mode? If I understand correctly, in monitor mode you won't be able to read the transferred data since my network is set up with WPA2...

Anything I'd need to consider?

Thanks for helping.
Comment 8 Stefan Soeffing 2015-02-20 10:04:42 UTC
With firmware version 25.15.12.0: No improvement, unfortunately.
Comment 9 Emmanuel Grumbach 2015-02-20 12:13:11 UTC
ok - we need to debug the firmware.
I'll send instruction on how to do that on Sunday.
Comment 10 Stefan Soeffing 2015-02-22 16:51:50 UTC
BTW, I just realized that the effect is worse when the wireless network is under load (video streaming from router to a 3rd device) - the connection stalls more often and takes longer to come back.
Comment 11 Emmanuel Grumbach 2015-02-22 18:21:10 UTC
That is interesting...
And it clearly points to the firmware.
I'll give the instructions shortly.
Comment 12 Emmanuel Grumbach 2015-02-22 19:03:18 UTC
Created attachment 167901 [details]
Core9 firmware with uSniffer

Please use the firmware attached.

Load iwlwifi with fw_monitor=1 as a module parameter.
When you feel that the bug is happening, *quickly* do the following:

echo 1 > /sys/kernel/debug/iwlwifi/*/iwlmvm/fw_restart

This will crash the firmware which will create a devcoredump entry:

cat /sys/devices/virtual/devcoredump/devcd1/data > iwl.dump
echo 1 > /sys/devices/virtual/devcoredump/devcd1/data

iwl.dump should weigh around 4.2MB. Please compress it and attach it to the bug.

Please the time to read this note:
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#privacy_aspects
Comment 13 Emmanuel Grumbach 2015-02-22 19:05:06 UTC
correction, if you don't want to have the data attached to the bug, please send by email to me.
Note that the data is very obfuscated. Your call.
Comment 14 Stefan Soeffing 2015-02-22 19:37:12 UTC
Even with sudo it says
bash: /sys/kernel/debug/iwlwifi/*/iwlmvm/fw_restart: Permission denied

CONFIG_IWLWIFI_DEBUGFS is probably not enabled, which means I need to recompile the kernel module, is that right?
Comment 15 Stefan Soeffing 2015-02-22 19:53:24 UTC
Well, CONFIG_IWLWIFI_DEBUGFS is set, in fact, but CONFIG_IWLWIFI_DEBUG is not.

Any hint what I'm missing?
Comment 16 Emmanuel Grumbach 2015-02-22 19:53:36 UTC
No.

Sudo doesn't apply on the redirection. You should sudo su - and then run the command.
Comment 17 Stefan Soeffing 2015-02-22 20:31:57 UTC
Created attachment 167911 [details]
iwl.dump

Thanks, kind of a beginner's mistake...

Please see the attached file.
Comment 18 Emmanuel Grumbach 2015-02-22 20:51:22 UTC
ok - I'll send the data to the firmware team.
Are you sure that you did
echo 1 > /sys/.../../../fw_restart while the bug was happening?
Comment 19 Stefan Soeffing 2015-02-22 20:56:13 UTC
Pretty sure. I had ping running, then packets started dropping as usual and I got the 'ping: sendmsg: No buffer space available' message.

Immediately after I did the fw_restart, ping recovered.

I realized that this was actually with power_save on. Would you like me to get another dump with power_save off?
Comment 20 Emmanuel Grumbach 2015-02-22 20:58:51 UTC
not needed at that stage.

The fact that you had the pings recover by restarting the firmware is very interesting.
I'll add all this to the internal bug I'll open on the firmware team.
Comment 21 Stefan Soeffing 2015-02-22 21:02:32 UTC
Worst case would be that it recovered by itself and I was just a split-second too late in doing the fw_restart.

I'll try to reproduce it the next time I see the bug happening.

Thanks a lot.
Comment 22 Emmanuel Grumbach 2015-02-23 00:47:05 UTC
Before I involve firmware guys, can you please reproduce the bug including the FW log thing while tracing is running?
The flags you used in comment #1 are perfect.
I would like to clear out something from the driver level.

Thanks.
Comment 23 Stefan Soeffing 2015-02-23 07:24:43 UTC
Created attachment 167941 [details]
sudo trace-cmd record -e iwlwifi -e mac80211 -e iwlwifi_msg

Same thing, ping recovered right after fw_restart.
Comment 24 Stefan Soeffing 2015-02-23 07:26:55 UTC
Created attachment 167951 [details]
iwl.dump

Corresponding fw dump
Comment 25 Emmanuel Grumbach 2015-02-23 12:14:55 UTC
Thanks - the data has been sent to the firmware team
Comment 26 Xavier Miller 2015-03-04 19:13:09 UTC
Hi!

Kernel 3.19.0-gentoo SMP PREEMPT x86_64 Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz GenuineIntel GNU/Linux

04:00.0 Network controller: Intel Corporation Wireless 7260 (rev 73)
        Subsystem: Intel Corporation Wireless-N 7260

Tested with iwlwifi-7260-10.ucode and iwlwifi-7260-12.ucode: too many disconnections. This is worse with V12.

With iwlwifi-7260-9.ucode, it's better  unless the driver warning: "Firmware has old API version, expected v10 through v12, got v9."

Will watch this bug report :-)

Kind regards,
Xavier Miller.
Comment 27 Emmanuel Grumbach 2015-03-04 19:17:44 UTC
@Xavier,

please follow the procedure from comment 12 so that I can collect data for the firmware team.
Don't forget to replace the firmware.

Thanks.
Comment 28 Xavier Miller 2015-03-04 20:23:46 UTC
Emmanuel,

How can I enable iwlwifi debug? I've activated it in the kernel:
[*] Enable full debugging output in the iwlwifi driver
[ ] Experimental uCode support
[*] iwlwifi device access tracing

But I don't see /sys/kernel/debug/iwlwifi entry (there is a /sys/kernel/debug entry)

I had micro-lags watching youtube videos, and I would crash the module as described in the comment 12, but without luck as now.

Xavier.
Comment 29 Xavier Miller 2015-03-04 20:27:55 UTC
Meanwhile, here are dmesg output during some youtube lags:
[  361.721019] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command ADD_STA
[  361.721024] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command ADD_STA
[  361.721114] iwlwifi 0000:04:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command ADD_STA
[  361.721135] iwlwifi 0000:04:00.0: U iwl_mvm_sta_rx_agg RX BA Session started in fw
[  361.959954] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command ADD_STA
[  361.959962] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command ADD_STA
[  361.960050] iwlwifi 0000:04:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command ADD_STA
[  361.960064] iwlwifi 0000:04:00.0: U iwl_mvm_sta_rx_agg RX BA Session stopped in fw
[  361.960072] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command ADD_STA
[  361.960075] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command ADD_STA
[  361.960163] iwlwifi 0000:04:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command ADD_STA
[  361.960171] iwlwifi 0000:04:00.0: U iwl_mvm_sta_rx_agg RX BA Session started in fw
[  361.973692] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command ADD_STA
[  361.973697] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command ADD_STA
[  361.973782] iwlwifi 0000:04:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command ADD_STA
[  361.973806] iwlwifi 0000:04:00.0: U iwl_mvm_sta_rx_agg RX BA Session started in fw
[  362.349189] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command ADD_STA
[  362.349196] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command ADD_STA
[  362.349279] iwlwifi 0000:04:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command ADD_STA
[  362.349289] iwlwifi 0000:04:00.0: U iwl_mvm_sta_rx_agg RX BA Session stopped in fw
[  362.349295] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command ADD_STA
[  362.349297] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command ADD_STA
[  362.349378] iwlwifi 0000:04:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command ADD_STA
[  362.349385] iwlwifi 0000:04:00.0: U iwl_mvm_sta_rx_agg RX BA Session started in fw
[  562.739045] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command ADD_STA
[  562.739052] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command ADD_STA
[  562.739246] iwlwifi 0000:04:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command ADD_STA
[  572.780630] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command ADD_STA
[  572.780636] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command ADD_STA
[  572.781014] iwlwifi 0000:04:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command ADD_STA
[  621.009543] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command SCAN_OFFLOAD_REQUEST_CMD
[  621.009549] iwlwifi 0000:04:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command SCAN_OFFLOAD_REQUEST_CMD
[  621.009716] iwlwifi 0000:04:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command SCAN_OFFLOAD_REQUEST_CMD
Comment 30 Emmanuel Grumbach 2015-03-05 13:41:29 UTC
(In reply to Xavier Miller from comment #28)
> Emmanuel,
> 
> How can I enable iwlwifi debug? I've activated it in the kernel:
> [*] Enable full debugging output in the iwlwifi driver
> [ ] Experimental uCode support
> [*] iwlwifi device access tracing
> 
> But I don't see /sys/kernel/debug/iwlwifi entry (there is a
> /sys/kernel/debug entry)
> 

You need debugfs (IWLWIFI_DEBUGFS)
Comment 31 Xavier Miller 2015-03-05 18:46:16 UTC
Created attachment 169311 [details]
dump

dump during a disconnect
Comment 32 Emmanuel Grumbach 2015-03-05 20:15:26 UTC
you forgot fw_monitor=1

And please, make sure that you capture the data by restarting *quickly* when the throughput goes down.

Thanks.
Comment 33 Xavier Miller 2015-03-05 22:04:45 UTC
Created attachment 169361 [details]
dump (with firmware tracing enabled)
Comment 34 Xavier Miller 2015-03-05 22:06:02 UTC
oops sorry. Let's hope that dump will be useful for you.
Comment 35 Xavier Miller 2015-03-06 08:11:11 UTC
Created attachment 169511 [details]
dump (with firmware tracing enabled)

other dump. Scenario : using this computer as distcc server, launching heavy compilations on other computer, then watching the load. When it is down (and seeing a network client disconnected), I restart the firmware and get this dump.
Comment 36 Emmanuel Grumbach 2015-03-08 11:17:45 UTC
thanks - these look good.
Comment 37 Stefan Soeffing 2015-04-07 15:59:50 UTC
Guys, is there any news on that?

I switched to a 3.19.2 kernel with firmware 25.16.12.0, which unfortunately gave no improvement at all.

To be honest, I'm pretty unhappy with the situation.

Thanks for your help, anyway.
Comment 38 Emmanuel Grumbach 2015-04-07 16:43:57 UTC
I opened a ticket on the firmware team. I can't do much more than that unfortunately.
Comment 39 sheksis 2015-04-09 13:58:35 UTC
I think that with more people moving to kernel 3.19(for which only ucode 10 and 12 work), this will be a big issue. I have struggled with this issue for around one month now. Now running kernel 3.18 and ucode 9 and things are okay. However, by staying on 3.18, I miss out on the updates that my distro offers.
Comment 40 hashem@riseup.net 2015-04-10 00:02:47 UTC
This issue has gone away since I turned off WPA2 on my access point.
Comment 41 sheksis 2015-04-10 01:57:08 UTC
My issue happens with WEP on my access point too. Pretty consistently reproducible. Play a couple of youtube videos and ping latencies drop to seconds instead of ms.
Comment 42 sheksis 2015-04-10 02:23:23 UTC
(In reply to hashem@riseup.net from comment #40)
> This issue has gone away since I turned off WPA2 on my access point.

Also, this couldn't be an access point issue because "Kernel 3.18 + ucode-9" works for me while "Kernel 3.19 + ucode-10/12" does not. There is a regression in this combo somewhere.
Comment 43 Emmanuel Grumbach 2015-04-11 19:42:18 UTC
Can you try 3.19 + ucode-9?
This combo is supported even if the driver will warn about the firmware being old.

Thanks.
Comment 44 sheksis 2015-04-12 05:36:37 UTC
(In reply to Emmanuel Grumbach from comment #43)
> Can you try 3.19 + ucode-9?
> This combo is supported even if the driver will warn about the firmware
> being old.
> 
> Thanks.
Emmanuel,
Thanks for engaging on this. 3.19 + ucode-9 works. So I guess this squarely puts
all the blame on ucode-10, ucode-12.

Also, this testing was done on both WEP and WPA2 with the same results.

Thanks.
Comment 45 sheksis 2015-04-12 06:38:10 UTC
(In reply to sheksis from comment #44)
> (In reply to Emmanuel Grumbach from comment #43)
> > Can you try 3.19 + ucode-9?
> > This combo is supported even if the driver will warn about the firmware
> > being old.
> > 
> > Thanks.
> Emmanuel,
> Thanks for engaging on this. 3.19 + ucode-9 works. So I guess this squarely
> puts
> all the blame on ucode-10, ucode-12.
> 
> Also, this testing was done on both WEP and WPA2 with the same results.
> 
> Thanks.

Okay, declared victory too soon. There might be something in 3.19 + ucode-9 combo too.
The youtube videos started getting stuck after 15 minutes of play. I also noticed that the average ping latency for 3.19+ucode-9 was more than 3.18+ucode-10/12.

Summary:
1. 3.18 + ucode-9 is better than 3.19 + ucode-9 which is better than 3.19+ucode-10/12.

2. I see less ping latencies going to seconds in 3.18 + ucode-9. And when they do, they recover very fast.

3. In 3.19 + ucode-9 has more ping latencies going to seconds and sometimes they recover fast enough for a video playback. But many videos do end up getting stuck.

4. 3.19 + ucode-10/12 --- *FUBAR*
Comment 46 sheksis 2015-04-12 06:40:12 UTC
Correction in above post: It should be "...3.19+ucode-9 was more than 3.18+ucode-9".
Comment 47 Emmanuel Grumbach 2015-04-12 07:57:28 UTC
@sheksis:

Please open a new bug. CC ilw@linux.intel.com to the bug. I prefer not to mix at this stage to avoid even more confusion.

Please attach dmesg output + wpa_supplicant output + tracing output for each one of the scenario you mentioned above.

Instructions about tracing:
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#tracing

Thanks.
Comment 48 Emmanuel Grumbach 2015-04-14 06:21:52 UTC
Created attachment 173981 [details]
Core9 firmware with uSniffer

Hi all,

I have a request from the firmware team.
I attached a new firmware with more debug data. Can you please repeat the data collection process with this firmware?

It also has a few more bug fixes, so the problem might be gone / harder to reproduce.

Thanks!
Comment 49 Xavier Miller 2015-04-14 12:34:55 UTC
Hi!

With kernel 4.0.0 and attachment 173981 [details], I don't see any lag for now.

Watched some HD videos in Youtube + compilations using distcc, no disconnections so far.

It sounds encouraging.
Comment 50 Emmanuel Grumbach 2015-04-16 14:00:19 UTC
Any news from other users?
Comment 51 Stefan Soeffing 2015-04-16 16:04:49 UTC
Thanks Emmanuel and the firmware team.

I've installed 4.0.0 and attachment 173981 [details] on Tuesday, but hadn't much time to test it.

Nevertheless, I haven't had any disconnect so far which is encouraging, indeed. From time to time, ping goes up for a few seconds - but the effect is much less pronounced than before.

I will keep testing for a few days and report back.
Comment 52 Emmanuel Grumbach 2015-04-16 16:33:44 UTC
The firmware team has released a firmware with the fix:

https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7260-12.ucode

I'll send a pull request to the official linux-firmware.git soon.
Comment 53 sheksis 2015-04-16 16:56:22 UTC
Same observation as Stefan.
Comment 54 Enrico Tagliavini 2015-04-17 17:38:03 UTC
Have an Intel Dual Band Wireless AC 7265 (so slightly different HW), REV=0x210 here, from Intel NUC NUC5i5RYH. I'm affected by the same problem. Using the firmware provided by default in Fedora 21 (version 25.15.12.0) the situation is quite severe as in the "no data transfer, but wireless is connected" and the "ping: sendmsg: No buffer space available" are very frequent. Note that when it happens the card usually *does not recover* and a full system reboot is needed. Yes disconnecting and reconnecting fixes the issue, but for a very short time usually, restarting has better chances. There is some lucky day here and there, but usually it is a matter of watching a single youtube video in HD and the issue appears quite easily. Using kernel 3.17.9 instead of 3.19.3 might have a slightly better working ratio, but still breaks.

I tried firmware version 25.17.12.0 from iwlwifi/linux-firmware.git . There is a clear improvement. The card manage to recover now, sometimes takes few minutes but it recovers. Pinging it from the AP usually increase of 3 order of magnitude during the "crysis" periods (let's say from 15 ms to > 5000, sometimes 15000). This is still with kernel 3.19.3 as provided by fedora 21.

Today I took the kernel SRPM version 4.0.0-1 for fedora 22 and recompiled it for fedora 21, keeping the firmware to version 25.17.12.0. I can see no improvement compared to kernel 3.19.3. Maybe it might be even slightly worst, I got wireless disconnecting with 4.0, but I'm not sure if I did the test with 3.19.3 with power management enabled or not, with 4.0 PM was enabled when the disconnect happened. I will do more test with 3.19.3.

Compared to another computer with an older card and looking into the AP interface I can see something interesting. While the signal quality for both clients is very similar (around 50%, -56 db) the TX and RX rates are very different: 1M (very unstable with some, short, peak maybe at 10M, sometimes more) for the NUC, 65M - 130M (and quite stable) for the other computer. However if I run the wavemon utility (or iwconfig) on the NUC it reports a bitrate of 78 Mbit, so it seems the AP and the NUC are not agreeing.

Note this is with 11n 2.4 GHz only AP. I tried also to force 11g only mode on the AP and it made no difference.

I'm quite happy it doesn't get stuck now, but please keep working on this, the current state is not really what I would define usable yet. Using VoIP, video chat, VOD, streams can be a pain if the card has one void moment. Even SSH gets stuck and can be painful.
Comment 55 Enrico Tagliavini 2015-04-17 17:57:43 UTC
Ok 3.19.3 is the same as 4.0, depending on the day you get disconnects or not. Yesterday I was lucky, today I get disconnected.
Comment 56 Xavier Miller 2015-04-18 14:13:20 UTC
I confirm I have less disconnections, but there still happen some. There are less frequent and shorter, but they still exist.

Kernel 4.0.0
Comment 57 Emmanuel Grumbach 2015-04-18 17:44:38 UTC
Thanks to all of you for your time testing.

I'll bring all this to the firmware team and I will discuss the next steps with them.
In the meantime, Enrico, please share your dmesg output. I'd like to check if the Tx queues get stuck.
Thanks.
Comment 58 Xavier Miller 2015-04-18 18:22:48 UTC
Created attachment 174371 [details]
dmesg output

Linux redpoint 4.0.0-gentoo #1 SMP PREEMPT Mon Apr 13 20:19:30 CEST 2015 x86_64 Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz GenuineIntel GNU/Linux

with proposed GIT firmware
Comment 59 Emmanuel Grumbach 2015-04-18 19:06:19 UTC
Thanks Xavier.
Clearly, you are getting Tx queue hang.

There is another bug on that:
https://bugzilla.kernel.org/show_bug.cgi?id=95941

I'll wait a few more days to get more reports (I'd like to see the dmesg output from Enrico), but I will probably close this bug then and we will concentrate our effort on bug 95941.
Comment 60 Xavier Miller 2015-04-18 19:11:40 UTC
Thanks Emmanuel, I will register on bug 95941
Comment 61 Enrico Tagliavini 2015-04-19 13:41:31 UTC
Created attachment 174471 [details]
dmesg output NUC iwl 7265 kernel 3.19.3

This is dmesg out from two days ago with kernel 3.19.3. Today is one of the good days and it is going to be harder to reproduce the issue (as I mentioned already it is like a Russian roulette, something it works like a charm sometimes it doesn't. with the newest firmware the chances of having a bad one are fewer). Also the affected system is 500 Kms away from me, it's my parents pc, not mine. I have an hard time debugging the issue since I'm accessing via the unstable wireless and my parents struggles starting firefox cannot ask their help.

Kernel 4.0 dmesg has also a trace that can be related. attaching it in a minute.
Comment 62 Enrico Tagliavini 2015-04-19 13:46:09 UTC
Created attachment 174481 [details]
dmesg output NUC iwl 7265 kernel 4.0

this is the one with the trace, near the end after 45 minutes of use. However you can see there are many disconnection in the mean time. Those are not due to bad signal or manual disconnection.

Also full disconnections are way rarer than bad performance, but that produce no output in dmesg.
Comment 63 Emmanuel Grumbach 2015-04-19 14:02:15 UTC
Ok - this is something completely different.
Please loook at bug 95901
Comment 64 Emmanuel Grumbach 2015-04-19 18:20:56 UTC
@Enrico,

please open a new bug and attach the output of your supplicant with -dd to get more debug.

Thanks.

Anybody else about this bug?
The stuck queue issue is monitored in another bug, so I am considering closing this one.
Comment 65 Stefan Soeffing 2015-04-19 18:47:55 UTC
Emmanuel,

I'm pretty happy with the new combination so far. Connection is much more stable. Sometimes the ping rate goes up to ~1s, but only for a short time (few seconds). Attaching a dmesg output - is that the Tx queue problem as well?

Thanks a lot!
Comment 66 Stefan Soeffing 2015-04-19 18:48:51 UTC
Created attachment 174491 [details]
dmesg output - Tx queue problem?
Comment 67 Emmanuel Grumbach 2015-04-19 18:52:31 UTC
nope - this is not the Tx queue problem.
The tx queue problem will happen only when we can't get any packet out for 10 seconds. You are far from this.

Ok - the original submitter is happy. A lot has gone on on this one.
Closing it now.
As a reminder, Enrico needs to open a new bug, and anyone with the tx queue problem can look at bug 95941.

FYI - the tx queue problem gives you a very detailed information in the dmesg output - you can't really miss it.
Comment 68 Enrico Tagliavini 2015-04-20 19:02:08 UTC
Hi Emmanuel, I opened bug #96931 and while doing that potentially made a very interesting discovery.

Thank you for looking into this.
Comment 69 sheksis 2015-04-27 10:31:24 UTC
This bug requires further investigation. Look at this!

64 bytes from 192.168.1.1: icmp_seq=57 ttl=254 time=11631 ms
64 bytes from 192.168.1.1: icmp_seq=58 ttl=254 time=10633 ms
64 bytes from 192.168.1.1: icmp_seq=59 ttl=254 time=13219 ms
64 bytes from 192.168.1.1: icmp_seq=60 ttl=254 time=12936 ms
64 bytes from 192.168.1.1: icmp_seq=61 ttl=254 time=12628 ms
64 bytes from 192.168.1.1: icmp_seq=62 ttl=254 time=14109 ms
64 bytes from 192.168.1.1: icmp_seq=63 ttl=254 time=13543 ms
64 bytes from 192.168.1.1: icmp_seq=64 ttl=254 time=15495 ms
64 bytes from 192.168.1.1: icmp_seq=65 ttl=254 time=16075 ms
64 bytes from 192.168.1.1: icmp_seq=67 ttl=254 time=16165 ms
64 bytes from 192.168.1.1: icmp_seq=68 ttl=254 time=16663 ms
64 bytes from 192.168.1.1: icmp_seq=69 ttl=254 time=17606 ms

I'll file a new bug and upload the traces, dmesg and wpa_supplicant output there with the firmware that Emmanuel provided. One question, I do not know how to collect the wpa_supplicant output. Are there steps anywhere to guide on this?
Comment 70 Stefan Soeffing 2015-04-27 20:41:21 UTC
You might want to have a look at bug 97291 as well...
Comment 71 sheksis 2015-04-28 10:59:31 UTC
@Stefan
Seems very similar to my issue. I'll monitor bug 97291.