Hello, I have a problem with my Dell XPS 12. The wireless connection just stops working. NetworkManager still shows me as connected but a ping to the router gives Destination Host Unreachable. Disabling and enabling the Wireless restores the connection for a little while. Sometimes this issue happens very frequently (every few minutes), sometimes I can stay connected for a whole evening. It seems weird, but in my perception, "seeking" on Youtube videos makes this issue appear more quickly. It sounds similar to bug 93431 which is supposed to be fixed. I installed iwlwifi from backport-iwlwifi version 4fd69991 Kernel OpenSUSE 4.1.6-2.gce0123d-desktop firmware-version: 15.209721.0 from iwlwifi/linux-firmware Issue has been present with the original drivers that came with Suse as well. Does this sound like a firmware issue or broken hardware? I attached a trace-cmd result when this situation happens. A parallel running ping command displays like 64 bytes from 192.168.1.1: icmp_seq=979 ttl=64 time=34.5 ms 64 bytes from 192.168.1.1: icmp_seq=980 ttl=64 time=31.5 ms 64 bytes from 192.168.1.1: icmp_seq=981 ttl=64 time=53.2 ms 64 bytes from 192.168.1.1: icmp_seq=982 ttl=64 time=13.4 ms 64 bytes from 192.168.1.1: icmp_seq=983 ttl=64 time=1.27 ms 64 bytes from 192.168.1.1: icmp_seq=984 ttl=64 time=29.7 ms From 192.168.1.104 icmp_seq=1014 Destination Host Unreachable From 192.168.1.104 icmp_seq=1015 Destination Host Unreachable From 192.168.1.104 icmp_seq=1016 Destination Host Unreachable From 192.168.1.104 icmp_seq=1017 Destination Host Unreachable No entry is visible in dmesg
(In reply to ailin.nemui+kernel from comment #0) > Hello, I have a problem with my Dell XPS 12. The wireless connection just > stops working. NetworkManager still shows me as connected but a ping to the > router gives Destination Host Unreachable. Disabling and enabling the > Wireless restores the connection for a little while. Sometimes this issue > happens very frequently (every few minutes), sometimes I can stay connected > for a whole evening. > > It seems weird, but in my perception, "seeking" on Youtube videos makes > this issue appear more quickly. > > It sounds similar to bug 93431 which is supposed to be fixed. > > > I installed iwlwifi from backport-iwlwifi version 4fd69991 > > Kernel OpenSUSE 4.1.6-2.gce0123d-desktop > > firmware-version: 15.209721.0 from iwlwifi/linux-firmware > > Issue has been present with the original drivers that came with Suse as > well. > > > Does this sound like a firmware issue or broken hardware? > > I attached a trace-cmd result when this situation happens. A parallel > running ping command displays like > > 64 bytes from 192.168.1.1: icmp_seq=979 ttl=64 time=34.5 ms > 64 bytes from 192.168.1.1: icmp_seq=980 ttl=64 time=31.5 ms > 64 bytes from 192.168.1.1: icmp_seq=981 ttl=64 time=53.2 ms > 64 bytes from 192.168.1.1: icmp_seq=982 ttl=64 time=13.4 ms > 64 bytes from 192.168.1.1: icmp_seq=983 ttl=64 time=1.27 ms > 64 bytes from 192.168.1.1: icmp_seq=984 ttl=64 time=29.7 ms > From 192.168.1.104 icmp_seq=1014 Destination Host Unreachable > From 192.168.1.104 icmp_seq=1015 Destination Host Unreachable > From 192.168.1.104 icmp_seq=1016 Destination Host Unreachable > From 192.168.1.104 icmp_seq=1017 Destination Host Unreachable > > > No entry is visible in dmesg I didn't manage to time the recording properly so it became unnecessarily large (too large for bugzilla). Maybe you can still use it even if you have to download it from a 3rd party site? Otherwise I'll have to try some more times... Download site -> http://s000.tinyupload.com/index.php?file_id=45896859822207246811
What does ethtool -i wlan0 say? Please provide dmesg output.
Ah you have a trace recording. I'll look at it. Please provide the ethtool output though.
Thanks for your effort! ethtool -i wlp4s0 driver: iwlwifi version: 4.1.6-2.gce0123d-desktop firmware-version: 15.209721.0 bus-info: 0000:04:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no
Created attachment 185881 [details] dmesg output attached
Do you have another machine that you could put in monitor mode to sniff the traffic? I could fetch the trace-cmd file, but I can't see any problem there. Did you stop it when you had issues?
I hit ^C when I saw the ping failing, so near the end of that trace I could not use the Internet anymore. I got some computer for traffic monitoring (Intel 3160). Currently the problematic computer has been stable for a day, and since I don't have any real clue yet how to *force* this problem to appear I am sort of waiting for it to occur again. I'm not familiar with traffic sniffing: Is it appropriate to do the traffic sniffing in monitor mode and seeing the encrypted packages? I tried to use wireshark with an established connection and promiscuous mode but I cannot observe any packages that aren't my own.
Usually the firmware of the wifi cards filter out most of the packets that are not interesting (for instance, the packets that are not for your device) among other things, so you can't really get good results when sniffing on the same device you're trying to debug (or any device that is connected, for that matter). You should have a separate device and set it to monitor mode and move to the correct channel: iw <interface_name> set mode monitor iw <interface_name> set channel <channel_of_your_AP> Then you can start sniffing with wireshark. Meanwhile, I'll try to take a look at the trace-cmd logs you provided.
Hmmm... I can de-gpg the trace file since it's encrypted with Emmanuel's key. He is on extended vacations now, so I can't ask him to open it up. Can you upload it again, maybe using my PGP key? http://pgp.mit.edu/pks/lookup?op=vindex&search=0xA1479CA21A3CC5FA (There you can see that I also work for Intel, though I'm logged in bugzilla with my private account).
I re-encrypted the file with your key Luca: https://www.dropbox.com/s/1lj5ki13u4pnscx/trace2.dat.7z.gpg?dl=0 Seems like we need a key for the group :)
Thanks, Emmanuel! Yeah, we should have a private key for our group. I'll create one, then we can discuss how to securely share it among us later on.
hi, sorry for the late response. Did you manage to see anything in this trace, since Emmanuel did not? I sent the Computer to Dell for repair and they (said they) exchanged the WLAN card, however the problem persists. In you judgement, do you think this seems like a potential hardware issue and if so would you have any guess as to which part to look at? I'm still struggling with the recording, I tried it with a Ralink USB on the same computer but I can hardly see any packages in wireshark. I also feel like NetworkManager is interfering with the monitor mode, do you have any further guidance? Anyway I will keep trying. Thanks,
Created attachment 188051 [details] wireshark capture of iwlwifi card itself I attach a wireshark recording done directly on the intel card, started after the connection broke down. I have no idea if that is anything useful (probably not) it has the .201 IP. In the middle of the recording, I plugged in the .109 which is the Ralink USB WLAN card
By the way I am under the impression that this problem does not happen with 11n_disable, but I need to investigate this further
Today I spent one evening trying to reproduce this bug. Fruitless! Dowloaded several GB without any issue. Aggravating. At least I think I understand how to monitor the traffic now, need to disable NetworkManager.
Created attachment 189341 [details] this capture done from 2nd device: connectivity is lost -> connectivity restored by hard reset I attach this traffic capture, it starts at a time where the connectivity has already been severed (network is not working anymore), until a hard reset using rfkill, after which network resumes operation. I'm still trying to get a capture which shows working -> broken transition. Wireless was turned off at frame 12499 (deauthentication), reset the wireless with rfkill; then started wget (now working again) mac of broken device is ...:11:7e:cf
Created attachment 189371 [details] capture from 2nd device: connectivity works -> connectivity broken here's the trace where a wget download was running and then stalled
I feel like it is easier to reproduce when the link speed is at 140Mb or so
Thanks for the logs and persistence in trying to reproduce! :) I have reported an internal bug to our firmware team to try to see what is going on. We will keep you updated.
Created attachment 193181 [details] Core14 FW with uSniffer Our firmware team is asking for more detailed firmware logs. Can you please reproduce with the firmware attached? Please capture trace-cmd logs (as before) and, as soon as possible after the problem occurs, capture the firmware logs as explained here: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging Please make sure you read the notice about privacy.
Hello, It seems I have the same issue. The symptoms are the same, I am connected but I don't have network access, sometimes it comes back after ~30-60s but most of times I need to enable/disable airplane mode. Here is the dmesg http://paste.vinilox.eu/?e3decaea10a483fb#ir8oUbujuK9yADH4PUvBpqPq12VRdhVJ4BGDJOkEF4s= What can I do to help if this is the same issue ? Thanks for your help !
Sorry forgot to add : Archlinux everything up to date Kernel 4.3 Firmware 17.228510.0 I tried every firmware from -14 to -17.
@Vinilox, You seem to be suffering from the same problem indeed. Please follow the steps described in comment 20 and provide the debug data. thanks.
Created attachment 194841 [details] trace.dat encrypted vinilox
Created attachment 194851 [details] Dmesg with CONFIG_IWLWIFI_DEBUG=y Here is a new dmesg with CONFIG_IWLWIFI_DEBUG=y. I can't provide a FW debug log because I don't have the /sys/kernel/debug/iwlwifi/ directory. Sorry for that. My kernel config is here http://paste.vinilox.eu/?e163dee924c77cd7#b+nDJygUYlYowU1HcC7IR1Hly/YmRgD2/Y0gHKYW/e0= sorry if I forgot to set a config option. If I set 11n_disable=1, this bug isn't present. Thanks for your help.
You are missing IWLWIFI_DEBUGFS. Ok, thanks. We have a few other reports saying that 11n_disable=1 makes the bug disappear.
That was fast ! Thank you, recompiling right now and sending you the log ASAP.
Created attachment 194861 [details] FW dump Here is the fw dump. I hope it will be useful. If it's not tell me and I will retry. Thank you.
How did you create this fw dump? I don't see that you had the FW crash because of stuck queue here? Actually, did you experience any WiFi malfunction when you created this fw dump?
I did experience a wifi malfunction and just after I saw it in the dmesg I ran cat /sys/devices/virtual/devcoredump/devcd1/data > iwl.dump. I will try again.
Please attach the fw dump / dmesg of the same occurrence. I suggest to run with fw_restart=0. This will prevent firmware reload and then you are sure that you have mismatches. Of course, WiFi will not be functional until you reload the modules.
Ok. Trying right now with fw_restart=0.
Created attachment 194871 [details] fwdump Here is the new coredump, I launched cat /sys/devices/virtual/devcoredump/devcd1/data > fwdump after the no connectivity problem. dmesg for this one http://paste.vinilox.eu/?512c1241a8c8757c#NdBH1FChBY/Gjr9cRWGgxudSGsUv9vcZ9FxYbeUdLYs= I hope it will be helpful.
Pasting here the relevant info from dmesg: [ 46.131973] iwlwifi 0000:01:00.0: Queue 16 stuck for 10000 ms. [ 46.131995] iwlwifi 0000:01:00.0: Current SW read_ptr 251 write_ptr 48 [ 46.132064] iwl data: 00000000: ff 07 00 00 00 00 00 00 00 00 00 f8 00 00 00 00 ................ [ 46.132094] iwlwifi 0000:01:00.0: FH TRBs(0) = 0x00000000 [ 46.132119] iwlwifi 0000:01:00.0: FH TRBs(1) = 0xc011000a [ 46.132142] iwlwifi 0000:01:00.0: FH TRBs(2) = 0x00000000 [ 46.132165] iwlwifi 0000:01:00.0: FH TRBs(3) = 0x80300007 [ 46.132188] iwlwifi 0000:01:00.0: FH TRBs(4) = 0x00000000 [ 46.132210] iwlwifi 0000:01:00.0: FH TRBs(5) = 0x00000000 [ 46.132234] iwlwifi 0000:01:00.0: FH TRBs(6) = 0x00000000 [ 46.132256] iwlwifi 0000:01:00.0: FH TRBs(7) = 0x0070907d [ 46.132319] iwlwifi 0000:01:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0008 [8,8] [ 46.132381] iwlwifi 0000:01:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0008 [0,0] [ 46.132444] iwlwifi 0000:01:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0008 [198,198] [ 46.132506] iwlwifi 0000:01:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0008 [0,0] [ 46.132568] iwlwifi 0000:01:00.0: Q 4 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.132617] iwlwifi 0000:01:00.0: Q 5 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.132667] iwlwifi 0000:01:00.0: Q 6 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.132716] iwlwifi 0000:01:00.0: Q 7 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.132766] iwlwifi 0000:01:00.0: Q 8 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.132815] iwlwifi 0000:01:00.0: Q 9 is active and mapped to fifo 7 ra_tid 0x0000 [126,126] [ 46.132865] iwlwifi 0000:01:00.0: Q 10 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.132915] iwlwifi 0000:01:00.0: Q 11 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.132964] iwlwifi 0000:01:00.0: Q 12 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.133014] iwlwifi 0000:01:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.133063] iwlwifi 0000:01:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.133113] iwlwifi 0000:01:00.0: Q 15 is active and mapped to fifo 5 ra_tid 0x0008 [0,0] [ 46.133162] iwlwifi 0000:01:00.0: Q 16 is active and mapped to fifo 1 ra_tid 0x0000 [251,48] [ 46.133212] iwlwifi 0000:01:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.133262] iwlwifi 0000:01:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 46.133311] iwlwifi 0000:01:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
Is the fw dump right this time ?
This one is excellent thanks.
I may have to get back to you after I'll talk to FW people (and that may take time)
Great. Good luck and thank you ! No problem, I will do what I can to help.
one last request (for now). Can you please make sure that the bug still exists with the firmware from here: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7260-16.ucode Make sure to delete -17.ucode before you test. No need to create fw_dump this time. Thank you.
Created attachment 194881 [details] dmesg I can confirm that the bug is still here with the new ucode.
Created attachment 194891 [details] Core14 FW with uSniffer with TX_MNG probes Hi Vilinox, I talked with the FW folks. It seems to be power save related. Can you try to disable power save? sudo iw wlan0 set power_save false. Moreover, I am attaching a new firmware with more probes. Can you please reproduce with this firmware and please record tracing? The goal here is to have the tracing output and fw dump of the same run. Please run with fw_restart=0, that will make it easier. Thanks for your cooperation!
Trying this now. And no problem thank you for your help !
Created attachment 194911 [details] trace
Created attachment 194921 [details] iwl.dump
Created attachment 194931 [details] dmesg And the dmesg. I hope it worked because it crashed just after I set power_save to off, if not tell me and I will retry.
argh wait. Sorry... I forgot to enable debug on the firmware... I'll send a new firmware...
Created attachment 194941 [details] Core14 FW with uSniffer with TX_MNG probes Ok - here we go. Please start tracing as soon as you load the driver. Then, let it go until you hit the bug with fw_restart=0. Thanks!
Ok. Doing this now.
Created attachment 194951 [details] dmesg
Created attachment 194961 [details] trace
Created attachment 194971 [details] iwl.dump It crashes as soon as I set power_save to off and if I understand the dmesg says that too.
Ok thanks. I looked at it along with the FW team. We seem to be hearing energy in the air all the time which prevents us to transmit. I wonder why this doesn't happen when 11n_disable is true. Can you please record a trace with 11n_disable? Since you won't have FW crash this time, you can use the manual trigger: echo 1 > /sys/kernel/debug/iwlwifi/0000\:0X\:00.0/iwlmvm/fw_dbg_collect Thanks!
Created attachment 194981 [details] iwl.dump 11n_disable I recorded the trace during ~5min while wget an ubuntu iso. The trace is big and I can't attach it here, you can download it there http://cloud.vinilox.eu/index.php/s/9BrUPZwaUWDXiie When I don't have the fw crash (with wifi N), the connection is in average slowly than with wifi G. It will have faster maximum but overall it will be very variable and slower. Here I am in a city on a building with a LOT of N access points but I also tried in a house with no wifi at all except my N AP and the connexion is still very slow (I don't remember if I had fw crash). I hope this can help. Tell me if you need something else.
So all the captures you sent until now are in a very busy environment? If the bug also reproduces in a quiet environment, I'll appreciate if you could create a dump there. No need to disable power save. Many thanks.
Yes it's in a very busy environment. I will do this on sunday, I can't go there before sorry. Only dump or dump + trace ? No problem, thank you.
No problem. It is kinda week end anyway :) Dump + dmesg. Thanks.
Ahah true :D Ok, will do. Have a nice week end !
Any news for me?
Created attachment 195191 [details] iwl.dump quiet environment
Created attachment 195201 [details] trace.dat quiet environment
Created attachment 195211 [details] dmesg quiet environment I couldn't reproduce a crash but I got a lot of connection drop/reconnect. I hope it can help. Have a nice week.
Excellent, I will take a look today and forward this to the firmware team. Many thanks for your help.
Thank you. You're welcome, glad I could help. Tell me if you need something else.
can you try to set cfg80211_disable_40mhz_24ghz to true? This is module parameter to the cfg80211 module. THanks.
Trying now.
Created attachment 195381 [details] iwl.dump cfg80211_disable_40mhz_24ghz=true
Created attachment 195391 [details] trace.dat cfg80211_disable_40mhz_24ghz=true
Created attachment 195401 [details] dmesg cfg80211_disable_40mhz_24ghz=true With options iwlwifi fw_restart=0 option cfg80211 cfg80211_disable_40mhz_24ghz=true Still the crash. My router does use 40MHz but I can also set it to 20 MHz if that can help.
(In reply to Vinilox from comment #68) > Still the crash. My router does use 40MHz but I can also set it to 20 MHz if > that can help. yes please.
Created attachment 195411 [details] dmesg router set to 20MHz It seems like this is the problem. I set my router to 20MHz and I was able to download an entire ubuntu iso without firmware crash or deauthentification. The dmesg looks like the wifi G one. And the speed is around 4Mo/s with max around 5Mo/s and it's STABLE (never got a stable speed with wifi N before). Do you want the trace.dat and a manual dump ?
yes please. Do you have other WiFi devices that operate properly when the bandwidth is set to 40Mhz?
Created attachment 195421 [details] iwl.dump router set to 20MHz
Thanks. We are making progress. One more thing. Can you please send the output of your scan? This will contain information that you may want to encrypt as well. sudo iw wlan0 scan thanks.
Here is the trace.dat https://cloud.vinilox.eu/index.php/s/bOfZuNTn7Ss0JZE file is too big I can't attach it here. I have a Nexus 4, moto 360, Moto G (2nd gen), and 2 MacBook air 2014/2015. I also have this problem with my university wifi and every N wifi I tried.
Created attachment 195431 [details] scan router set to 20MHz
Created attachment 195441 [details] router parameters Here are all the router parameters I can tweak, maybe it could help. Great to see you are making progress. :)
well... We know more, but this kind of issue is a really big pain for the firmware team. I can't promise anything about when we will have a fix, but at least we know in what area is the culprit.
If this can help, I also have a chromecast (1st gen) which works on wifi N 40MHz. I can't tell you if all the devices I listed are using the 40MHz sorry.
I understand. But now that I know it works with disabling 40MHz or disabling wifi N it's better. Before that the wifi/laptop was unusable. Anyway thank you and the firmware team and good luck !
Your AP is on channel 9 or channel 13? in 20Mhz you are now on Channel13, before that you seemed to be on Channel9?
I am just asking if these devices perform better than Intel while the AP is configured in 40Mhz. I am not asking if they are actually using the 40Mhz spectrum, just asking if they operate properly when the AP is configured as it was.
My AP is now on channel 13, previously (40MHz) it was using channel 9 (primary) and channel 13 (secondary) IIRC. It scans for the best channel. Ok, so yes they all work properly. BTW I just saw I messed up with cfg80211_disable_40mhz_24ghz=Y, so it was still with 40MHz sorry. Do you want me to do it again with AP 40MHz and laptop 20MHz ?
(In reply to Vinilox from comment #82) > BTW I just saw I messed up with cfg80211_disable_40mhz_24ghz=Y, so it was > still with 40MHz sorry. Do you want me to do it again with AP 40MHz and > laptop 20MHz ? Yes please.
Created attachment 195461 [details] iwl.dump cfg80211_disable_40mhz_24ghz=Y
Created attachment 195471 [details] dmesg cfg80211_disable_40mhz_24ghz=Y Trace.dat https://cloud.vinilox.eu/index.php/s/5L5pizH1BiJu3Ar It was slower even within 1m to the AP → 2Mo/s vs 5Mo/s with AP set to 20MHz but it might be to the channel change (8 and 12 this time).
I didn't experienced a firmware crash. As always I was wget ubuntu.iso.
I tested again with channels 9 and 13 and I don't have the speed issue mentioned in comment 85.
Ok - so bottom line, moving to 20Mhz helps whether it is done from AP or from Client side. Correct?
Sorry for the noise. After more testing : - I don't have firmware crash whether it's done from AP or from client. - But if done from client, the speed is less consistent and slower.
Hi again, We are trying to define a solution for your case. In order to define a suitable solution from the system point of view, we'd need you to test with the following hack. If it helps, we will have a direction for a better fix: diff --git a/drivers/net/wireless/iwlwifi/mvm/rs.c b/drivers/net/wireless/iwlwifi/mvm/rs.c index e23a579..f6fa02d 100644 --- a/drivers/net/wireless/iwlwifi/mvm/rs.c +++ b/drivers/net/wireless/iwlwifi/mvm/rs.c @@ -1519,10 +1519,10 @@ static s32 rs_get_best_rate(struct iwl_mvm *mvm, static u32 rs_bw_from_sta_bw(struct ieee80211_sta *sta) { - if (sta->bandwidth >= IEEE80211_STA_RX_BW_80) - return RATE_MCS_CHAN_WIDTH_80; - else if (sta->bandwidth >= IEEE80211_STA_RX_BW_40) - return RATE_MCS_CHAN_WIDTH_40; +// if (sta->bandwidth >= IEEE80211_STA_RX_BW_80) +// return RATE_MCS_CHAN_WIDTH_80; +// else if (sta->bandwidth >= IEEE80211_STA_RX_BW_40) +// return RATE_MCS_CHAN_WIDTH_40; return RATE_MCS_CHAN_WIDTH_20; } thanks.
of course, we'd need you to test this with 40Mhz on both ends.
Created attachment 196961 [details] dmesg with patched kernel
Created attachment 196971 [details] iwl.dump with patched kernel
does it feel better?
Sorry for the answer delay. With the 4 lines commented and 40MHz on both ends, I have no fw crash. I successfully downloaded 2 ubuntu.iso but I noticed that in 10 minutes I got 2-3 time wget hang, it was stuck, dmesg showed nothing and a ping to google appears to be hang as well and 20-30s later, everything was back to normal. Speed was consistent around 5MB/s (except during the hangs it was showing --- B/s) with maximum at ~7MB/s. Trace.dat here https://cloud.vinilox.eu/index.php/s/JDtJD7yAx8lnLzI It definitely feels better since I have no FW crash and I have just hangs so download/transfer was not corrupted. I can use this daily. Thanks ! :)
wait - this is only the start... We wanted you to run this to define a proper solution in the long run. We had originally made lab experiments on this but they weren't conclusive. Your real life experiments are very valuable to us. This gives us a hint as of which system direction we want to take. Now that we know that this helps, we need to properly define the solution and implement it (in firmware most probably). I could manage to have the system and firmware team involved and I really hope that we are now in the direction of a solution. I have no words to express my gratitude to your patience and your priceless input.
I understand. If you need my help, just ask me and I will try to help. I am glad this will be solved and even without a proper fix, I can now use my laptop (without wifi it was useless). Big thumbs up and thanks to you, the system and firmware teams ! You are the ones to thank, not me. :)
*** Bug 107471 has been marked as a duplicate of this bug. ***
Hi Vinilox, I had a discussion about this issue with the firmware team. They want to explore the PHY direction. Basically what happens is that we keep sensing energy on the extension channel but not on the control channel. This is why we can't transmit data in 40Mhz and this is also why if we limit ourselves to 20Mhz it helps (we don't check what happens on the whole 40MHz bandwidth). From here, there are 2 possible directions: 1) Limit ourselves to 20Mhz in this case, but some firmware people aren't happy with this approach because it might cause other issues: if we hear energy on the extension channel, the AP should be hearing it as well and hence our transmission in 20Mhz will be botched anyway. 2) Assume that we *wrongly* sense energy on the extension channel and debug why we *think* there is energy when the air might be clean. For now, we are going for direction number 2. This means that I am working with other people (PHY people) in the firmware team and they are asking more questions :) We would like to know if, after we reset the firmware (because of the firmware crash), we can have *any* successful transmission in 40Mhz. In order to know that, we need tracing that lasts long enough: before and after the crash. All you need to do is to have enough traffic running (ping -i 0.2 is enough) so that the driver we try to raise its transmission rate after the crash. This will allow to see if resetting the firmware at least allows transmission in 40Mhz. If that was to be the case, it'd mean that there is a bug in the firmware / PHY that can be fixed or at least workarounded. If not, it really means that the environment you are working in is such that we will not ever be able to transmit in 40Mhz and we'll need to check for direction number 1 again.
Hi all, we made some progress with another user not on bugzilla. I am talking about the FW crash only (not high ping latency). Can you please run with power_scheme=1 as a module parameter to iwlmvm? sudo echo "options iwlmvm power_scheme=1" >> /etc/modprobe.d/iwlwifi.conf reboot. thanks!
Apparently with "power_scheme=1" the firmware doesn't crash (using the debug firmware you provided me in my original bug report). Further tests will follow. My original bug report is https://bugzilla.kernel.org/show_bug.cgi?id=107471.
(In reply to Ansa89 from comment #101) > Apparently with "power_scheme=1" the firmware doesn't crash (using the debug > firmware you provided me in my original bug report). > Further tests will follow. > > My original bug report is https://bugzilla.kernel.org/show_bug.cgi?id=107471. Ok - so, with power_scheme=1 it doesn't crash, but with power_scheme=2 (which is the default) and iw wlan0 set power_save off, it does crash. Right?
|--------------|------------|------------| | power_scheme | power_save | RESULT | |--------------|------------|------------| | 2 (default) | off | CRASH | |--------------|------------|------------| | 2 (default) | on | CRASH | |--------------|------------|------------| | 1 | off | OK | |--------------|------------|------------| | 1 | on | NOT TESTED | |--------------|------------|------------|
Couldn't be clearer :)
can you please remove power_scheme=1 and do the following (as root): echo lprx=0 > /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/iwlmvm/pm_params This will disable a power save feature that can be causing the troubles. thanks.
Took me a while, but in the end it crashed. I have collected a dump just in case you need it. Also when collecting the dump I got this oops in dmesg: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 3601 at drivers/net/wireless/iwlwifi/mvm/fw.c:824 iwl_mvm_fw_dbg_collect_desc+0xab/0xc0 [iwlmvm]() Modules linked in: iwlmvm iwlwifi uas usb_storage hid_generic mac80211 cfg80211 ipv6 rfcomm bnep lp ppdev parport_pc parport fuse snd_hda_codec_hdmi i2c_dev hid_multitouch uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core usbhid v4l2_common hid videodev btusb btrtl btbcm btintel bluetooth joydev snd_hda_codec_realtek snd_hda_codec_generic i915 drm_kms_helper drm snd_hda_intel snd_hda_codec intel_gtt snd_hda_core psmouse agpgart snd_hwdep i2c_algo_bit xhci_pci fb_sys_fops syscopyarea xhci_hcd sysfillrect snd_pcm sysimgblt snd_timer mei_me i2c_i801 snd mei soundcore ideapad_laptop ehci_pci ehci_hcd efivars sparse_keymap i2c_core lpc_ich wmi rfkill serio_raw evdev soc_button_array loop [last unloaded: iwlwifi] CPU: 2 PID: 3601 Comm: bash Tainted: G W 4.3.3 #1 Hardware name: LENOVO 20268/Cherry 3A Touch, BIOS 7CCN62WW 11/13/2014 ffffffffc04e0838 ffff880201213d60 ffffffff815df9c8 0000000000000000 ffff880201213d98 ffffffff81086608 ffff88006dec1428 0000000000000000 ffff88006dc84290 0000000000000000 0000000000000001 ffff880201213da8 Call Trace: [<ffffffff815df9c8>] dump_stack+0x44/0x5c [<ffffffff81086608>] warn_slowpath_common+0x88/0xc0 [<ffffffff810866fa>] warn_slowpath_null+0x1a/0x20 [<ffffffffc04ac3ab>] iwl_mvm_fw_dbg_collect_desc+0xab/0xc0 [iwlmvm] [<ffffffffc04ac424>] iwl_mvm_fw_dbg_collect+0x64/0x80 [iwlmvm] [<ffffffffc04cfd5a>] _iwl_dbgfs_fw_dbg_collect_write+0x7a/0xb0 [iwlmvm] [<ffffffff811a7488>] __vfs_write+0x28/0xf0 [<ffffffff81193471>] ? kmem_cache_alloc+0xe1/0x200 [<ffffffff811b5e9f>] ? getname_flags+0x4f/0x1f0 [<ffffffff810c1691>] ? percpu_down_read+0x21/0x50 [<ffffffff811a7b49>] vfs_write+0xa9/0x190 [<ffffffff811a8856>] SyS_write+0x46/0xa0 [<ffffffff81c3b21b>] entry_SYSCALL_64_fastpath+0x16/0x6e ---[ end trace 4f15751d1323a623 ]--- iwlwifi 0000:02:00.0: Collecting data: trigger 1 fired. iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
So it works better than the normal configuration? Please send me the dump. Thanks
Not really: after the initial crash, following crashes happen often (maybe the firmware reinitialization overwrites "lprx=0" parameter?). Moreover I have to say I'm not in the same environment hence I don't know if this played an important role. Dumps sent by email.
I'll check the persistence of the configuration tomorrow. Can you please send the whole dmesg output about the oops? Thanks.
Dmesg sent by email.
Thanks for the dmesg. I think I found the problem and fixed it in our internal tree. We are analyzing the logs you sent, but it seems that the issue you are seeing with lprx=0 is slightly different although the effect is the same. There is another user that reported that disabling LPRX helped.
Today I will try to do some tests in the original environment (if I get some spare time).
I checked the persistence of the lprx over firmware crash: it is persistent.
@Ansa89: I analyzed the logs with the firmware team experts. What is happening is that the AP is allocating the air for a very long time and we can't get a chance to send our own traffic. When we do get a chance to transmit data, we can't get ACK from the AP. This needs to be further debugged, but this is not the same issue as the original one.
Hello, Still here and following this issue, I didn't had time to test sorry. I will be able to send dmesg/crash/fw dump by the end of the week.
(In reply to Emmanuel Grumbach from comment #105) > can you please remove power_scheme=1 and do the following (as root): > > echo lprx=0 > /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/iwlmvm/pm_params > > This will disable a power save feature that can be causing the troubles. > > thanks. Tested in the first environment: no crashes, but the connection speed is very unstable; using wget to download a single file I got speeds between 500 KB/s and 2 KB/s (with an average speed of ~10 KB/s). Moreover wget often shows "---" as current speed. (In reply to Emmanuel Grumbach from comment #114) > @Ansa89: > > I analyzed the logs with the firmware team experts. What is happening is > that the AP is allocating the air for a very long time and we can't get a > chance to send our own traffic. When we do get a chance to transmit data, we > can't get ACK from the AP. > > This needs to be further debugged, but this is not the same issue as the > original one. Should I open a separated bug report?
(In reply to Ansa89 from comment #116) > (In reply to Emmanuel Grumbach from comment #105) > > can you please remove power_scheme=1 and do the following (as root): > > > > echo lprx=0 > > /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/iwlmvm/pm_params > > > > This will disable a power save feature that can be causing the troubles. > > > > thanks. > > Tested in the first environment: no crashes, but the connection speed is > very unstable; using wget to download a single file I got speeds between 500 > KB/s and 2 KB/s (with an average speed of ~10 KB/s). > Moreover wget often shows "---" as current speed. > Ok - the environment is very busy to start with. I'd be glad to have a dump in this situation (same environment and lprx disabled). You can trigger a dump by doing: echo 1 > /sys/kernel/debug/iwlwifi/0000\:0X\:00.0/iwlmvm/fw_dbg_collect > > > (In reply to Emmanuel Grumbach from comment #114) > > @Ansa89: > > > > I analyzed the logs with the firmware team experts. What is happening is > > that the AP is allocating the air for a very long time and we can't get a > > chance to send our own traffic. When we do get a chance to transmit data, > we > > can't get ACK from the AP. > > > > This needs to be further debugged, but this is not the same issue as the > > original one. > > Should I open a separated bug report? I don't think it is worth it at this stage. In that environment, can you please test this? diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h b/drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h index 0036d18..0c6b38c 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h @@ -189,7 +189,7 @@ enum iwl_tx_pm_timeouts { */ #define IWL_DEFAULT_TX_RETRY 15 #define IWL_MGMT_DFAULT_RETRY_LIMIT 3 -#define IWL_RTS_DFAULT_RETRY_LIMIT 60 +#define IWL_RTS_DFAULT_RETRY_LIMIT 3 #define IWL_BAR_DFAULT_RETRY_LIMIT 60 #define IWL_LOW_RETRY_LIMIT 7
Vinilox? :)
I know that we ask a lot from the submitters, but unfortunately we still need your cooperation to continue investigating this bug. When can we get the required data?
Created attachment 200991 [details] dmesg power_scheme=1
Created attachment 201001 [details] trace.dat power_scheme=1
Created attachment 201011 [details] iwl.dump power_scheme=1
Really sorry for the BIG delay, I had a lot of IRL things to solve... :( Now I am free to do any testing you want. Here is a summary |--------------|------------|------------| | power_scheme | power_save | RESULT | |--------------|------------|------------| | 2 (default) | off | CRASH | |--------------|------------|------------| | 2 (default) | on | CRASH | |--------------|------------|------------| | 1 | off | NOT TESTED | |--------------|------------|------------| | 1 | on | CRASH | |--------------|------------|------------| What should I test now ? I didn't knew if I had to test the same things as @Ansa89, is this the same issue ? Just tell me what you want me to test and I will do it quickly. Sorry again :(
Thanks. I'll analyze your dump to see if you faced the same issue as before or this is a new issue.
I looked at your logs. I can see the original problem: we still can't get a chance to transmit. This is somewhat disappointing because Ansa89 saw a different issue after disabling power save (Low Power Rx really) and it fixed the problem for a user using 8260. Back to square one...
Indeed it's disappointing. Tell me what you want me to test and I will. If this helps : - Laptop with a 7260-N single band (only 2,4GHz) http://ark.intel.com/products/75174/Intel-Wireless-N-7260, antennas under my wrists (one antenna left and one right) → using this here → crash with 40MHz - Desktop (since yesterday) 7260-AC http://ark.intel.com/products/75439/Intel-Dual-Band-Wireless-AC-7260 with PCI-E adapter http://www.amazon.fr/gp/product/B016RU3T6S?psc=1&redirect=true&ref_=oh_aui_detailpage_o01_s00, 2x 6DB antennas → connected to the 2,4GHz wifi for testing → no problem with 40MHz → N (2,4GHz - 40MHz) and AC (5GHz - 80MHz) work perfectly. Same AP with one SSID 2,4GHz and one SSID 5GHz. If you want, I put the laptop card in the desktop to test. I can't do desktop (AC) → laptop because lenovo use a whitelist in their bios and the AC card is not allowed...
This is really strange... I guess you can try to put the laptop card in the desktop, maybe the problem will be "solved" by the strong antennas.
I will try this and tell you what happened.
So, my laptop died, the motherboard is dead... I will try to reproduce laptop conditions with my PCI-E adapter and the laptop card + antennas.
With the laptop card in the desktop : - With 6dB antennas → no crash (40MHz), much better speed than in the laptop, same speed as the 7260-AC in N mode. - With laptop antennas → no crash, always connected to AP BUT sometimes network is unreachable and I have to toggle airplane mode on/off to make it work again, slightly better speed than in the laptop. I have the impression that just by slightly pulling off the antenna cable (10cm), it improved everything → no crash (yet) and connection drops happens less often. Maybe this issue was with this particular laptop antennas configuration → The antennas are under the wrists and the antennas cables are tightened against the battery. If you want I can try again with very short antennas cables. Sorry that I can't help more with my dead laptop. :(
Just made another test with a short antenna cable surrounded by everything I could put → It take 3-4 (3 tries each) times to connect, and I have a lot of network unreachable but connected to AP all the time → I can't get it to crash... Maybe the laptop was emitting some weird EM interferences ?
I am sorry to hear about your laptop. What is the laptop model? It really looks like an antenna layout problem but I don't understand much about this. I'll ask people who may know more. Can you please record a dump with the setup you were describing in comment #131? You can trigger the dump with the fw_dbg_collect debugfs hook in mvm. Thanks.
It was a lenovo yoga 2 13" (not pro). Ok, if you need pictures or something else, tell me. I can take it apart. I will do this tomorrow.
Your latest tests seem to show that you suffered from an antenna issue. This can happen on certain systems and is beyond Intel's responsibility. I will close this issue as will not fix. Despite the fact that this bug will now be closed, I'll still receive notifications about any comment someone may add to this bug.
Note that Intel's privacy policy forbids me to keep the data collected in this bug. I am now deleting all the captures from our systems.
I followed "duplicate" links from my bug report 107861 to this one, and I am quite puzzled as to what exactly is the solution here? I seem to have the exactly same problem as the original reporter, but on a Dell M3800 laptop, so it is not very likely to the exactly same antenna issue.