Bug 48921
Summary: | iwlwifi triggers HW restart each 300 seconds | ||
---|---|---|---|
Product: | Drivers | Reporter: | szczarek (szczarek) |
Component: | network-wireless | Assignee: | drivers_network-wireless (drivers_network-wireless) |
Status: | CLOSED DUPLICATE | ||
Severity: | high | CC: | akhan, alan, anarsoul, andrewd18, arthur.titeica, assertnull, camden.lindsay+kernel, david, djdjaa89, drivers_network-wireless, fab, hendry, ilw, johannes, kernel.org, kernel.org, linville, stf_xl, the.aidar |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.8 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
andrewd18_dmesg_output
lspci -vv output |
Description
szczarek
2012-10-16 11:59:21 UTC
HW: HP 8530w, WiFi HW: 03:00.0 Network controller: Intel Corporation Ultimate N WiFi Link 5300 I am experiencing the same problem with Centrino N 2200 series. Options wd_disable, 11n_disable only delays the restarts but they still happen and quite randomly. But there's one thing common across all of the iwlwifi crashes I have seen so far -- they all start with "fail to flush all tx fifo queues" messages. You guys are definitely not alone here with this 'feature'. Take a look here: https://bugzilla.redhat.com/show_bug.cgi?id=805285 https://bugzilla.redhat.com/show_bug.cgi?id=833117 ( unfortunately, the reporter here opted to being deluded with a false negative idea that it is his ram and cheap AP rather than iwlwifi ) https://bugs.launchpad.net/ubuntu/+source/linux/+bug/984552 ( a lot of conjectures as of 802.11n being the cause ) http://www.linuxforums.org/forum/wireless-internet/188886-problem-connect-wifi-my-asus-laptop.html http://ubuntuforums.org/showthread.php?t=1941350- http://askubuntu.com/questions/153092/cant-find-intel-wireless-n-1000-after-waking-from-sleep https://bugzilla.redhat.com/show_bug.cgi?id=825491 https://bugzilla.redhat.com/show_bug.cgi?id=833117 I have the same "iwlwifi 0000:03:00.0: fail to flush all tx fifo queues" with 5300AGN here with all of the firmwares (iwlwifi-5000-ucode-8.83.5.1-1.tar.gz, iwlwifi-5000-ucode-8.83.5.1-1.tgz, iwlwifi-5000-ucode-8.24.2.12.tgz and iwlwifi-5000-ucode-5.4.A.11.tar.gz) from intellinuxwireless.org So, yeah, wifi and linux is still apparently a bad idea. I am waiting here for the shit to hit the fan and then, maybe the whiphy intel will in fact jump in. I noticed *_idle errors in few back traces so I went ahead and started trying different cpu idling and power options. I tested with BIOS' Adaptive Thermal Management by turning it to max on battery and then disabled PCI Express Power Management features. It still happened and quite randomly. So I went ahead and started testing with different kernel patches. I stumbled upon Brain Fuck Scheduler and after patching with it, until now I did not hit this problem of iwlwifi not getting any reply after 3000ms, reporting unable to flush all tx fifo queues or queues getting stuck or full. To @Aidar, It is necessary not all kernel's fault; instead it is Intel's iwlwifi driver which is mostly screwed up to its fullest. To the best, Intel and Intel's developers seem to turning deaf ears and blind eyes to all these bug reports. Well done Intel, you better be not making any more networking hardware as nothing tells me that you are any needles' capable of doing that. Oh and by the way, I setup a server machine with an on-board Intel network controller. You know what happened with its e1000e driver? It freakingly crashed after few hours. Not cool. Looks like the problem is caused by NetworkManager triggering periodical scanning (every 5 minutes), so perhaps using disable_hw_scan=1 can workaround this bug. @Stanislaw Gruszka Newer iwlwifi does not seem to provide that option anymore. And see the references to other tx fifo issues mentioned by Aidar; not all of those crashes are caused by frequent scans. It is Intel developers; lack of interest that is leading to such trashy behavior. Clearly see those increasingly agitating bug reports and see their responses, if are any! Ohh, right, that option was removed. But there is possible one more workaround, periodical scanning can be disabled in NetworkManager by assigning BSSID filed in options to MAC address of AP. Bug originally reported here indicate problem triggered periodical, if you other problem you should probably CC yourself to other bug or open a new one. Stanislaw, unfortunately, your conjecture that specifying the BSSID MAC in NetworkManager is good enough is false. I checked out my configuration of NM and nm-applet. It has had BSSID mac specified from day 1, but, notwithstanding, inevitably, I still see "fail to flush". I appreciate when each bug is very specific. I also appreciate when you follow the rules to the letter, but in this particular case, the indirection you are suggesting by opening yet another new request for the same issue is just going to create another reference to reference. Frankly, this all looks as bad as a ponzi schema. At some point somebody will have to man up and step in. ( I am looking at you, intel whiphy gang ). Enough of this already, just face & deal with it, in this very context, here, already, please. :) Just a tough: What if nasa would have no choice but use wifi chips from one huge chipmaker for its Curiosity rover? They would have to run a twisted pair from earth to mars given that whiphy option from that co. I've seen on, I don't know how many different forums/blogs/bugs thus far, people alluding to this being something specific to NetworkManager. It isn't. And reading this is getting a touch frustrating, as it makes me believe folks are chasing false solutions, or, thinking something is fixed, that isn't fixed, or, that the problem is never going to be solved. Two different iwl-1000 machines here. Neither using NetworkManager. You can remove that layer from the equation. I am using, quite simply, wpa_supplicant, (-Dnl80211), and dhcpcd. That's it. No UI of any sort. No networkmanager. Nada. Don't recall how far back this goes, but 3.6.0, 3.6.2, 3.6.5, 3.6.6, 3.6.8, the issue hasn't gone anywhere. Taking time to figure out the correct fix is perfectly ok in my book. Pulling random solutions out of nowhere, is not - this is not NM, never has been, never will be. For anyone interested, this is a (crappy) HP DV4, PCI bus data as such: 02:00.0 Network controller [0280]: Intel Corporation Centrino Wireless-N 1000 [Condor Peak] [8086:0084] Subsystem: Intel Corporation Centrino Wireless-N 1000 BGN [8086:1315] Capabilities: [140] Device Serial Number 00-1e-64-ff-ff-2b-90-38 The cynic in me thinks that the way this is going to be "fixed" is by a future commit that simply removes the logging of this error - the error will still occur, but it simply won't be logged. I hope I am wrong about that... experiencing the same issues with: Intel Corporation Centrino Wireless-N 1030 [Rainbow Peak] (rev 34) could bring debug info if needed / would help I think I see the same problem on a stock 3.8.0-1rc4--mainline-dirty build from http://sakuscans.com/pacmanpkg/x86_64/ With lots of iwlwifi 0000:03:00.0: fail to flush all tx fifo queues messages. IIUC Arch https://wiki.archlinux.org/index.php/ThinkPad_X230#Suspension seems to imply running an "optimized kernel" which tbh I'm not happy about doing. x220:~$ lspci -s 03:00.0 -vvv 03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205 [Taylor Peak] (rev 34) Subsystem: Intel Corporation Centrino Advanced-N 6205 AGN Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 47 Region 0: Memory at f1500000 (64-bit, non-prefetchable) [size=8K] Capabilities: <access denied> Kernel driver in use: iwlagn I'm getting this message here as well: iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Hardware is 03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205 [Taylor Peak] (rev 34) Saw it on all 3.7.x kernels (now I'm on 3.7.6 now) and on 3.6.11 I'm seeing this in 3.8 on Fedora. Specifically: Linux athena 3.8.3-103.fc17.x86_64 #1 SMP Mon Mar 18 15:46:01 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux A quick Google search didn't come up with any bug reports on intel.com; does anyone know 1. whether this has been reported directly to them, 2. if so, how to bug them about it, and 3. if not, how to report it? I can't seem to find any references to a bug handling system on their site. 2&3: See MAINTAINERS in the kernel tree. Sent an e-mail direct to the maintainers and mentioned this bug report. Having similar problems on Debian stable, using a vanilla kernel 3.12.6. Will attach a dmesg and lspci. Created attachment 121731 [details]
andrewd18_dmesg_output
Created attachment 121741 [details]
lspci -vv output
There is a W/A in 3.14 - please test 3.14. There is a W/A in 3.14 - please test 3.14. No information. Closing as duplicate of 56581 *** This bug has been marked as a duplicate of bug 56581 *** |