Created attachment 204391 [details] dmesg I am using Lenovo ideapad U330p, Intel Wireless-N 7260 (rev 73 / 0x144) inside, Gentoo outside. Wireless does not work unless I set iwlwifi option 11n_disable=1. Until recently I was quite pessimistic about chances to make it work with 40MHz band. It was accidentally discovered that 802.11n works fine when on battery, but dies soon after AC adapter is connected. I have tried with no observable effect: (a) 3 different AC adapters (b) fixing CPU frequency (c) different power_save and power_scheme values What else I can try to help debug it? Thanks in advance -L P.S. The logs are collected with vanilla 4.4.2. I started wget <huge-file> after the boot. AC power is connected at ~375s up, wget had successfully downloaded ~3GB by that time and stalls immediately after. I can send the firmware dump if needed.
This is mostly because of interference with the charger... Can you try with power_scheme=1 as a module parameter to iwlmvm ? I'll send a firmware for debug later. That will allow us to know why the firmware is unhappy (most probably because of the interference mention above).
Created attachment 204401 [details] Core14 FW with debug probes Please use the firmware attached and follow the instructions here: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging Take the time to read the privacy notice: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects
# just an interference or some particular electric noise from the adapter? # power_scheme=1 - I have tried it earlier to no avail. Should I produce dmesg as well?
(In reply to Lev Melnikovsky from comment #3) > # power_scheme=1 - I have tried it earlier to no avail. Should I produce > dmesg as well? Nope. Just check you see: [ 385.290146] iwlwifi 0000:02:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN as well. This is enough to assume it is the same problem. BTW - did you try 11n with 40MHz on 5.2GHz?
> Please use the firmware attached and follow the instructions here I've sent the dump via email. > Nope. Just check you see: > [ 385.290146] iwlwifi 0000:02:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN > as well. This is enough to assume it is the same problem. Yes, I do see it with power_scheme=1 (and your debug firmware): [ 47.166725] iwlwifi 0000:02:00.0: Start IWL Error Log Dump: [ 47.166730] iwlwifi 0000:02:00.0: Status: 0x00000000, count: 6 [ 47.166735] iwlwifi 0000:02:00.0: Loaded firmware version: 17.288042.0 [ 47.166740] iwlwifi 0000:02:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN [ 47.166745] iwlwifi 0000:02:00.0: 0x00800634 | uPc [ 47.166750] iwlwifi 0000:02:00.0: 0x00000000 | branchlink1 [ 47.166754] iwlwifi 0000:02:00.0: 0x00000B30 | branchlink2 [ 47.166758] iwlwifi 0000:02:00.0: 0x000167DC | interruptlink1 > BTW - did you try 11n with 40MHz on 5.2GHz? Now I wonder if my card should support 5.2GHz band? I have only tried 2.4GHz. Another observation: if I set 11n_disable=1 (w/ or w/o AC adapter), then I get max throughput about 2.5 MByte/s. If I remove 11n_disable and work on battery, then the throughput is 10 MByte/s (probably limited by 100Mbps ethernet at the access point). Does it make sense?
(In reply to Lev Melnikovsky from comment #5) > > > BTW - did you try 11n with 40MHz on 5.2GHz? > Now I wonder if my card should support 5.2GHz band? I have only tried 2.4GHz. Just checked - your card doesn't support 5.2GHz. But working in 40MHz in 2.4GHz is not really recommended. Also, take a look at https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#about_platform_noise > > Another observation: if I set 11n_disable=1 (w/ or w/o AC adapter), then I > get max throughput about 2.5 MByte/s. If I remove 11n_disable and work on > battery, then the throughput is 10 MByte/s (probably limited by 100Mbps > ethernet at the access point). Does it make sense? 2.5 MBytes/s = 20Mb/s which is what you can expect without 11n. 10MByte/s = 80Mb/s which can be a decent throughput for 11n in certain case.
> 2.5 MBytes/s = 20Mb/s which is what you can expect without 11n. > 10MByte/s = 80Mb/s which can be a decent throughput for 11n in certain case. Sorry, I naively assumed that Shannon theorem predicts twice throughput for 40MHz vs 20MHz band. I was also deceived by iw reporting tx bitrate 54 Mb/s (vs 150 Mb/s with N enabled). > But working in 40MHz in 2.4GHz is not really recommended. I can not replace the card due to Lenovo BIOS whitelist... > Also, take a look at > https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#about_platform_noise Want me to look at the AC adapter output with an oscilloscope?
Created attachment 204421 [details] firmware dump [encrypted]
(In reply to Lev Melnikovsky from comment #7) > Sorry, I naively assumed that Shannon theorem predicts twice throughput for > 40MHz vs 20MHz band. I was also deceived by iw reporting tx bitrate 54 Mb/s > (vs 150 Mb/s with N enabled). Should be so. 40MHz should give twice the throughput of 20MHz under ideal conditions. Due to interference and the shared nature of the medium, it can be sometimes better to work in 20MHz only, especially on 2.4GHz as I said in the wiki page. 150Mb/s means that you don't have 40MHz enabled, or you don't have SISO. I can't remember which right now. > > > But working in 40MHz in 2.4GHz is not really recommended. > I can not replace the card due to Lenovo BIOS whitelist... You can still limit yourself to 20MHz with the cfg80211 module parameter. > > > Also, take a look at > https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#about_platform_noise > Want me to look at the AC adapter output with an oscilloscope? That's nice to offer, but I am not sure I'll be able to understand anything from the output :) Anyway, I got your firmware dump, I'll take a look later.
> 150Mb/s means that you don't have 40MHz enabled, or you don't have SISO. I > can't remember which right now. it means only single stream, I guess http://mcsindex.com/
I looked at the dump file you created. Seems like at some point, we just stop sending any data. This can be caused by interference. I am involving the firmware team. Thank you.
Hello, I got a reply from the firmware team. We had issues with the probe enabled in the firmware and this explains the strange things I saw. The firmware team will fix the probes and then we will need another dump from you. Thank you.
Created attachment 208381 [details] Core14 FW with debug probes Here is the firmware we need you to use to get the debug data. Thank you!
Created attachment 208571 [details] firmware dump [encrypted]
-this time it *seemed* more stable. The "Microcode SW error" was eventually triggered but it took much longer. Are you sure the debug probes were changed only?
Should be debug probes changes only. I'll take a look at the dump a bit later. Is the same story as before (with the power adapter and 11n_disable)?
I transferred the data to the firmware team.
There are bad interference. I can see that each time you want to transmit anything, our receiver detects energy in the air and hence can't transmit. The whole log is full of: trying to Tx, aborting Tx due to Rx. I'd need to change the timeout of the hang detection to see when it starts to happen, but I am pessimistic since you say that it is AC / DC related which hints to platform noise.
Hello, > Is the same story as before (with the power adapter and 11n_disable)? I had not tried 11n_disable with this firmware. Should I? This time the "bug" was not triggered immediately after connecting AC adapter. It took a minute or two (while downloading at ~10MB/s) before download stalled. I've never observed such stability before. I'll probably try again to see if this is reproducible. Is it necessary to reboot to try different firmware or I can just rmmod iwlmvm/iwlwifi? What is supposed to be a clean test procedure? > I can see that each time you want to transmit anything, our receiver detects > energy in the air and hence can't transmit. Well, this is what wget should look like - Rx lots, Tx acks only. I can try a different pattern, like uploading something, or make it symmetric with balanced up/down loading. Would it give more information? > trying to Tx, aborting Tx due to Rx Sorry for my naive assumption, but this *sounds* like an echo - RF feedback Tx->Rx...
(In reply to Lev Melnikovsky from comment #19) > Hello, > > > Is the same story as before (with the power adapter and 11n_disable)? > I had not tried 11n_disable with this firmware. Should I? Yes please > > Is it necessary to reboot to try different firmware or I can just rmmod > iwlmvm/iwlwifi? What is supposed to be a clean test procedure? modprobe -r iwlwifi is sufficient. As long as you get again the line "Loading firmware XXX" in dmesg > > > I can see that each time you want to transmit anything, our receiver > detects > > energy in the air and hence can't transmit. > Well, this is what wget should look like - Rx lots, Tx acks only. This is not really relevant. We are most probably not talking about real Rx since these would end at some point. But here we *constantly* see energy (which can't be a real wifi Rx). > > I can try a different pattern, like uploading something, or make it > symmetric with balanced up/down loading. Would it give more information? I don't think so. > > > trying to Tx, aborting Tx due to Rx > Sorry for my naive assumption, but this *sounds* like an echo - RF feedback > Tx->Rx... No since we don't get to the Tx part. Before transmitting, we check if someone else is already transmitting (CSMA) and in your case, we keep hearing energy, so we stop our Tx procedure before the radio emitted anything.
Hi again, I have repeated the test many times and gathered a lot of statistics: (a) the "bug" was never triggered with 11n_disable=1 (b) the "bug" was never triggered w/o AC adapter (c) the "bug" was always there w/ AC adapter and 11n_disable=0 The time required to trigger the bug in scenario (c) may vary from <1s to 5min. It seems that firmware 17.295852.0 sometimes stands longer than 17.288042.0 , but this is statistically insignificant. It also seems that proper power cycle and reboot give better hang recovery than module unloading/reloading, but again this is just a feeling. What happens during 10000 ms between wget stall and the woes from iwlwifi about stuck queue? Would it be possible to set the hung detection timeout as a module parameter? I have wiped the Windows off the hard drive immediately after the purchase so I can not tell if it behaves better or worse...
(In reply to Lev Melnikovsky from comment #21) > > What happens during 10000 ms between wget stall and the woes from iwlwifi > about stuck queue? The firmware is trying to send data ... and can't because we feel energy in the air. > > Would it be possible to set the hung detection timeout as a module parameter? > Yes, but we have another way. We add debug data to the .ucode file and configure the timeout this way. If you want, I can prepare a firmware with a debug data that will reduce the timeout.
(In reply to Emmanuel Grumbach from comment #22) > > What happens during 10000 ms between wget stall and the woes from iwlwifi > > about stuck queue? > The firmware is trying to send data ... and can't because we feel energy in > the air. Do you really mean to assume that "we feel energy in the air" for 10000 ms without a stop? Sometimes the "bug" is triggered only after a minute of high traffic. The throughput may fluctuate (probably due to severe interference you mentioned) but this is still >1000 packets per second. And then suddenly wget stops and we can not send a single packet for 10 consecutive seconds? > Yes, but we have another way. We add debug data to the .ucode file and > configure the timeout this way. > If you want, I can prepare a firmware with a debug data that will reduce the > timeout. Honestly, I don't even understand if this timeout should be increased or reduced... But I will try whatever you suggest to gather information you think is important.
(In reply to Lev Melnikovsky from comment #23) > (In reply to Emmanuel Grumbach from comment #22) > > > > What happens during 10000 ms between wget stall and the woes from iwlwifi > > > about stuck queue? > > The firmware is trying to send data ... and can't because we feel energy in > > the air. > Do you really mean to assume that "we feel energy in the air" for 10000 ms > without a stop? Yes - this can happen because of interference or noise created by the platform / power adapter. In any case, I opened a bug on the firmware team, there isn't much more I can do for now.
Created attachment 212461 [details] Core14 FW with LPRX disabled Please retest with this firmware. In this firmware, we disabled a feature that can cause the problem you are seeing and we would like to know if it feels better now. Thank you.
ping?
Created attachment 216621 [details] dmesg with 17.320364.0 Sorry for the delay, I could not exactly reproduce the setup I had at home (I don't have a second computer here to fill the bandwidth). So I redirected the traffic back to the laptop using iptables on an openwrt router. This gives stable ~4MB/s up + 4MB/s down with 11n_disable=0 and AC adapter disabled. (In reply to Emmanuel Grumbach from comment #25) > Please retest with this firmware. In this firmware, we disabled a feature > that can cause the problem you are seeing and we would like to know if it > feels better now. Unfortunately, it *feels* worse. The transfer stops immediately after I connect AC adapter and the connectivity is not restored even after I disconnect it again. I had to rmmod/insmod to make this post. dmesg is attached, the AC adapter is connected at 1090.
Unfortunately this bug is already very old and we haven't really been able to work around these platform issues, so I'll have to close it as won't fix...