Bug 173101
Summary: | Iwlwifi: 8260: TFD queue hang | ||
---|---|---|---|
Product: | Drivers | Reporter: | Tobias Schmetzer (tschmetzer) |
Component: | network-wireless | Assignee: | DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi) |
Status: | CLOSED WILL_NOT_FIX | ||
Severity: | normal | CC: | h8uvkyqnpfrzqyc, linuxwifi, luca |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: |
https://bugzilla.kernel.org/show_bug.cgi?id=172431 https://bugzilla.kernel.org/show_bug.cgi?id=183321 |
||
Kernel Version: | 4.8 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg, syslog, 3 (probably equal) iwldumps
iwlwifi-8000C-22.ucode firmware with debugging enabled attachment-15437-0.html |
Description
Tobias Schmetzer
2016-09-28 19:21:51 UTC
Hi, can you try the workarounds described here: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#about_platform_noise thanks. - USB3 doesn't seem to be the problem. I couldn't turn it off, but using it or not didn't change anything. - I haven't tried changing the wifi's power save mode as the problem occurs right after start. - The workaround cfg80211_disable_40mhz_24ghz=1 solely (not using any other workaround) did make the wifi work, even without the error message. - the workaround 11n_disable=1 solely (not using any other workaround) did also make the wifi work. => maybe duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=172431 I could not switch to 5 Ghz frequency as my router offers only a 2.4 Ghz frequency. But doint a "iwlist wlp1s0 scanning" I did encounter that there are 3 routers (including mine) using the same channel 6. But they seem to be assigned to 3 different cells. I'll soon provide you with a debug version of the firmware for further investigation. Created attachment 241461 [details] iwlwifi-8000C-22.ucode firmware with debugging enabled Can you take some logs with the attached firmware, following the instructions here? https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging This will allow us to further debug the issues you are experiencing. And please make sure you read and understand the privacy aspects of sending such logs to us: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects I'm unable to reproduce the error at the moment. I'll try to find out why during the next couple of days. I've already turned the iwlwifi modprobe options back to the original but it still works. There's also been an update to my Ubuntu 16.10 though. Seeing the same issues, on the same card, also on Ubuntu 16.10. I will email the firmware debug file and dmesg output. Additional information: >ethtool -i wlp4s0 | grep firmware firmware-version: 22.391740.0 >uname -a Linux x1ii 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:15:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >lspci -v 04:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a) Subsystem: Intel Corporation Wireless 8260 Flags: bus master, fast devsel, latency 0, IRQ 130 Memory at e1100000 (64-bit, non-prefetchable) [size=8K] Capabilities: <access denied> Kernel driver in use: iwlwifi Kernel modules: iwlwifi Laptop model: Lenovo Thinkpad Carbon X1 2016 (4th gen). I've been seeing the same problem on previous kernel versions. Finally the error occurred again. Looks like the error doesn't appear all the time as I was unable to reproduce it during the last couple of days (I didn't have time to check every day). Feels like there might be one (kind) of device that interferes but that's not always turned on. I will append three dumps I made. I sent the dumps by mail. When using the workaround I don't get queue error anymore neither any other error from iwlwifi. But apart from having a slower performance, I do also have a bad connection with high latency and packets get lost as a ping to my router shows: 64 bytes from 192.168.0.1: icmp_seq=584 ttl=64 time=16.9 ms 64 bytes from 192.168.0.1: icmp_seq=585 ttl=64 time=81.5 ms 64 bytes from 192.168.0.1: icmp_seq=587 ttl=64 time=98.3 ms 64 bytes from 192.168.0.1: icmp_seq=588 ttl=64 time=876 ms 64 bytes from 192.168.0.1: icmp_seq=590 ttl=64 time=9188 ms 64 bytes from 192.168.0.1: icmp_seq=604 ttl=64 time=8486 ms 64 bytes from 192.168.0.1: icmp_seq=606 ttl=64 time=8694 ms 64 bytes from 192.168.0.1: icmp_seq=608 ttl=64 time=8112 ms Perhaps that helps you finding the problem. If this has nothing to do with the firmware crash I would open a new ticket. Ok - the data you attached is instructive. You are hitting the "CCA on extension channel" problem. In that sense, it is a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=172431 The high latencies you see when you have the workaround in place are interesting. Can you create a manual dump when those occur using the fw_dbg_collect debugfs hook? thanks. My previous comment was for Tobias. And yes, it is indeed related to interference. What you can try is to see if an earlier version of the firmware works better for you. Glad you were able to analyze the problem! I will try to do the manual coredump to discover the reason for the latency. The high latency does also not always show up but it definitely did for a certain time (This might again be the interference with a device that was switched on at that time). Like what range of earlier version do you propose? And under what circumstances? Do you mean to see if the workaround had worked better with an earlier firmware version than 22? Can you try -21.ucode for example? Hi Mo, I got the data you sent privately. You are suffering from the same problem as Tobias which is that we constantly have CCA on the extension channel. I'll let Luca collapse all the bugs as duplicate for now. FYI - from my experience, these bugs are hard to address and fix and the firmware team is really busy so the iterations are long... Thanks for looking at it Emmanuel. FYI: I tried the -21.ucode firmware, same thing. Also for me, cfg80211_disable_40mhz_24ghz=1 doesn't fix the issue, but workaround 11n_disable=1 does. Created attachment 242121 [details] attachment-15437-0.html Makes sense since your problem is on extended 5GHz, so that explains why cfg80211_disable_4mhz doesn't help and it also explains why 11n_disable helps. On Fri, 2016-10-21 at 13:49 +0000, bugzilla-daemon@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=173101 --- Comment #15 from Mo <h8uvkyqnpfrzqyc@mopage.de<mailto:h8uvkyqnpfrzqyc@mopage.de>> --- Thanks for looking at it Emmanuel. FYI: I tried the -21.ucode firmware, same thing. Also for me, cfg80211_disable_40mhz_24ghz=1 doesn't fix the issue, but workaround 11n_disable=1 does. I had sent you the manual dump and you said you didn't see anything special and needed the firmware team for further analysis. As the problem occurred again I did some more testing and found out: - although I have 40 GHz turned off there's high latency and a remarkable package loss paired with almost no data coming through right after opening a connection like an arbitrary website. The firmware doesn't crash though and there's no problem reported in syslog. - after disabling 11n in iwlwifi.conf the problem is gone but the speed is pretty low of course (A speed test with the internet connection on Linux (40 Ghz and 11n disabled) is now at an average of 1,5M/0,8M/25ms (down/up/ping) whereas with the Windows driver I reach an average of 11M/0,8M/25ms (down/up/ping)) Sorry I need to correct myself: Even with 11n disabled the high latencies show up and occasionally I get a couple of bytes through (even normal websites take ages to get completely loaded) @Tobias: thanks. Thing is that at this point, we need real firmware eng to look at this :) I am a driver guy who triages the problems, but at this stage, we really want the firmware guys to check what happens. What I can do is to prepare a debug version of the firmware that will be less tolerant to latencies, and then, the FW crash will happen faster. We can hope that the FW crash will trigger even w/o 40MHz (and maybe even w/o 11n) to get valuable data. What is the ping latency you see when things go wrong? @Tobias, I think I'd like you to open a new bug for the issues w/o 11n. It can't be the same problem. BTW: if the data is encrypted, you can safely store it in bugzilla. Nobody can read it anyway and it make it a bit easier for us to track which dump matches what scenario. Since it is a CCA problem, we can't do much unfortunately. Closing as will not fix. |