Bug 93141

Summary: iwlwifi: 7260: slow traffic (<1Mbps) with -12.ucode
Product: Drivers Reporter: Matthew Rathbone (matthew.rathbone)
Component: network-wirelessAssignee: networking_wireless (networking_wireless)
Status: CLOSED INSUFFICIENT_DATA    
Severity: normal CC: hashem, ilw, jasper.mackenzie, soeffing
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 3.13 - 3.16 Subsystem:
Regression: No Bisected commit-id:
Attachments: iwlist output
Core9 firmware with uSniffer
iwl dump

Description Matthew Rathbone 2015-02-12 19:26:38 UTC
Created attachment 166621 [details]
iwlist output

Firstly -- thanks to intel for providing open source drivers for all their network devices, it's really great to have out of the box support for their hardware.

My hardware is a Thinkpad T440s with an Intel 7260 wireless card. I'm currently running Linux Mint 17.1, kernel 3.16, but also had these issues on 3.13. The firmware version used with the card is 25.228.9.0.


On certain wireless networks, regardless of signal strength I get very poor and spotty performance with my 7260 wireless card.

Other laptops sitting right next to mine get 60-70mb/sec in throughput, I get 1-10 with massive variation and periods of totally stalled data transfer. It makes it very hard to have video cards for work!

My laptop always connects to the 2.x ghz access point, but there are several 5.xx ghz channels open too.
I've attached the results of an iwlist, there are 2 2.xx ghz cells, and several 5.xx ghz cells.


Here is the output of my iwconfig:

wlan0     IEEE 802.11abgn  ESSID:"gener8tor"  
          Mode:Managed  Frequency:2.437 GHz  Access Point: 2A:A4:3C:04:7D:89   
          Bit Rate=144.4 Mb/s   Tx-Power=22 dBm   
          Retry short limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=53/70  Signal level=-57 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:12  Invalid misc:1105   Missed beacon:0



I'm not sure how to debug this further, I don't have access to the routers to change channels because they're provided by the office space, so I haven't tried that, but other non-linux laptops seem to have no problems whatsoever achieving maximum speed.


Let me know how I can help debug this, any suggestions are of course most welcome.
Comment 1 Emmanuel Grumbach 2015-02-12 19:29:52 UTC
please record tracing:

sudo trace-cmd record -e iwlwifi -e mac80211 -e iwlwifi_msg

that will be a start.
Can you move to 3.18?
Comment 2 Matthew Rathbone 2015-02-12 19:53:05 UTC
Hey,

I guess technically I could, but I'd rather not jump to a kernel unsupported in my main install. I could run a separate linux install to test it, but probably not until next week. Anything else I can do until then?


trace-cmd output:


~/Projects/personal/beekeeper(master)$ sudo trace-cmd record -e iwlwifi -e mac80211 -e iwlwifi_msg
/sys/kernel/debug/tracing/events/iwlwifi/filter
/sys/kernel/debug/tracing/events/*/iwlwifi/filter
/sys/kernel/debug/tracing/events/mac80211/filter
/sys/kernel/debug/tracing/events/*/mac80211/filter
/sys/kernel/debug/tracing/events/iwlwifi_msg/filter
/sys/kernel/debug/tracing/events/*/iwlwifi_msg/filter
Hit Ctrl^C to stop recording
^CKernel buffer statistics:
  Note: "entries" are the entries left in the kernel ring buffer and are not
        recorded in the trace data. They should all be zero.

CPU: 0
entries: 0
overrun: 0
commit overrun: 0
bytes: 168
oldest event ts: 45271.036377
now ts: 45271.041903
dropped events: 0
read events: 1075

CPU: 1
entries: 0
overrun: 0
commit overrun: 0
bytes: 1384
oldest event ts: 45270.937558
now ts: 45271.041957
dropped events: 0
read events: 981

CPU: 2
entries: 0
overrun: 0
commit overrun: 0
bytes: 1992
oldest event ts: 45271.000061
now ts: 45271.041994
dropped events: 0
read events: 988

CPU: 3
entries: 0
overrun: 0
commit overrun: 0
bytes: 2604
oldest event ts: 45271.039992
now ts: 45271.042020
dropped events: 0
read events: 37400

CPU0 data recorded at offset=0x4dd000
    229376 bytes in size
CPU1 data recorded at offset=0x515000
    208896 bytes in size
CPU2 data recorded at offset=0x548000
    208896 bytes in size
CPU3 data recorded at offset=0x57b000
    6017024 bytes in size

trace.dat here - https://www.dropbox.com/s/avoe9fvh9vb2j8n/trace.dat?dl=0
Comment 3 Matthew Rathbone 2015-02-12 19:55:20 UTC
Also worth noting that originally I had firmware version 23.214.9.0 and saw the same issues.
Comment 4 Emmanuel Grumbach 2015-02-12 20:09:29 UTC
23.214.9.0 can't be any better.
Looking at your traces.
Comment 5 Emmanuel Grumbach 2015-02-12 20:09:37 UTC
23.214.9.0 can't be any better.
Looking at your traces.
Comment 6 Emmanuel Grumbach 2015-02-12 20:16:00 UTC
everything looks optimal from a driver point of view.
So this must be a firmware issue.
Unfortunately, I can't debug it (separate team). And 3.16 doesn't have the tools I need to fetch debug logs.
You'd have to move to 3.17 at least. Which will also allow you to use a newer firmware.
Comment 7 Matthew Rathbone 2015-02-12 20:25:57 UTC
I'm going to download an alpha ubuntu 15.04 build which runs 3.18, and run that off a USB stick.

What commands would you like me to run? I can boot that up for 30 minutes, run stuff, then jump back out to my normal environment.
Comment 8 Emmanuel Grumbach 2015-02-12 20:43:37 UTC
that won't help - you'd need to replace the firmware anyway. To debug the firmware, I need to replace your firmware with a debug image of it.

To be honest, I am not sure the firmware team will have time to look at it anytime soon. But trying -10.ucode might help. And -10.ucode will certainly not be in the live USB stick of Ubuntu 15.04.
Comment 9 Matthew Rathbone 2015-02-13 03:25:36 UTC
Hey,

So if I have some time to try a newer kernel I'll try the -10 ucode. If I do get chance to try a new kernel, a couple of questions:

- Where do I get debug enabled firmware?
- What tests should I run when I have a debug firmware up and running?

If you can point me in the right direction I'll try to do some tests in the next week. Even if the firmware group doesn't have time to look at it yet, at least the data will be available when they do.


Thanks for spending so much time to help!
Comment 10 Emmanuel Grumbach 2015-02-13 07:16:41 UTC
BTW: did you try to disable power save?
Comment 11 Matthew Rathbone 2015-02-13 15:43:01 UTC
Hey,

Yeah, I disabled power management (that's the same thing right)? It shows wireless power saving as disabled in powertop.


I actually got bored last night and installed kernel 3.19, so I can tell you later today if that helped. Shockingly there don't seem to be any system stability issues with the new kernel.
Comment 12 Emmanuel Grumbach 2015-02-23 14:18:45 UTC
any news here?
Comment 13 Matthew Rathbone 2015-02-23 22:06:45 UTC
So I updated to 3.19 and the ucode-10 firmware and it helped. Top speeds are much better, still not what they should be, but better.

The connection still seems to degrade, which makes me wonder if it's just not picking the best frequency and access point?

There are two access points each with a 2.xx GHZ and a 5.xx GHZ frequency band, and I find that if I manually switch access points the connection picks up again. Although I haven't ever found a way to make it prefer the 5GHZ band.

Let me know how to do some debugging and I'll gladly help.


Matthew
Comment 14 Emmanuel Grumbach 2015-02-24 12:23:12 UTC
Ok - if you have 3.19, you can try -12.ucode as well.
thanks.
Comment 15 Matthew Rathbone 2015-02-25 21:56:31 UTC
Alright, so after some more time using -10 or -12 the issue is clearly still present. 

My initial success with -10 was just lucky and did not continue, and I am usually back in the same place - sub-1mb speeds sporadically increasing, sometimes dropping off, even when my phone, and macbook get 30mb+.

Again, happy to dun a debug firmware and to collect whatever metrics you need.

Matthew
Comment 16 Emmanuel Grumbach 2015-02-26 05:56:00 UTC
Ok - are you testing download or upload?

I will attach a debug image of the firmware to collect logs.
Comment 17 Emmanuel Grumbach 2015-02-26 06:05:41 UTC
Created attachment 168241 [details]
Core9 firmware with uSniffer

Please use the firmware attached.

Load iwlwifi with fw_monitor=1 as a module parameter.
When you feel that the bug is happening, *quickly* do the following:

echo 1 > /sys/kernel/debug/iwlwifi/*/iwlmvm/fw_restart

This will crash the firmware which will create a devcoredump entry:

cat /sys/devices/virtual/devcoredump/devcd1/data > iwl.dump
echo 1 > /sys/devices/virtual/devcoredump/devcd1/data

iwl.dump should weigh around 4.2MB. Please compress it and attach it to the bug.

Please the time to read this note:
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#privacy_aspects
Comment 18 Emmanuel Grumbach 2015-03-10 21:15:37 UTC
I'll close this bug in a couple of days if the data is not provided.

You will still be able to re-open once you have data.
Comment 19 Jasper Mackenzie 2015-05-07 05:09:10 UTC
Created attachment 176091 [details]
iwl dump
Comment 20 Jasper Mackenzie 2015-05-07 05:11:18 UTC
I have similar/identical symptoms as described here.
 Using the latest linus kernel and firmware from git (today, 4.1-rc2) I tried with the attached firmware loaded with fw_monitor=1

Before suspend I could not get the normal flaky behaviour. I could send and receive at what I assume is full speed consistently for a long sustained period. 

After resume from suspend the flaky behaviour is immediately evident, so I captured a devcoredump as you describe. 

I will try now with the standard -12.unicode firmware and 3.19 kernel to see if suspend/resume is the issue.

Do you need additional details?
Comment 21 Jasper Mackenzie 2015-05-07 05:26:13 UTC
No such luck.
 It is now flaky all the time with the attached firmware, even after powering down for a few minutes, to you know, let all the bad juju out 8(
Comment 23 Jasper Mackenzie 2015-05-07 08:55:52 UTC
Unfortunately the problematic behaviour still exists, albeit that the time it takes to manifest is some indeterminate amount longer.

The steps taken to determine this behaviour are to play youtube videos whilst streaming the audio to a remote pulse-audio server whilst pinging the local google, then starting a copy from mounted nfs share to local drive. In some instances the audio is sent without stutter, in others the audio stutters (whilst the rsync transfer speed is about the same), and in the problematic scenario the ping time varies wildly as does the rsync transfer speed, audio is never good. Eventually no data is transferred at all and pings hang indefinitely, at which stage reconnecting to the AP with network manager brings it back for seconds to minutes - but never long.

Damn. It was looking so good for a while there. Can send another dump if it helps.
Comment 24 Emmanuel Grumbach 2015-05-07 09:01:28 UTC
what does ethtool -i wlan0 say?
Comment 25 Jasper Mackenzie 2015-05-07 09:02:38 UTC
driver: iwlwifi
version: 4.1.0-rc2-latest
firmware-version: 25.17.12.0
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
Comment 26 Emmanuel Grumbach 2015-05-07 09:06:18 UTC
please join bug 97291 and create a dump with the firmware in comment 26 there.