Created attachment 176971 [details]
See Description, # Additional Information
The iwlwifi module will sometimes crash at random intervals after the system has been on for a little while. dmesg has the line `Error sending REPLY_ADD_STA: time out after 2000ms.' immediately at the point of the crash. I am unable to unload the module. Attempting to forcefully unload it results in a kernel panic/system freeze.
This problem has occurred very few times in the past several months (half a year), however, now it is incredibly frequent and makes it annoying to use this workstation/laptop.
I am unable to reproduce this issue as it occurs randomly. Sometimes when there is little network activity, and sometimes when there is lots. The iwlwifi module crashes not just when connected to my home wireless access point, but also to any other wireless access point (whether it be at the University [of Sydney], McDonalds, etc).
# Actual Results
My device disconnects from the wireless access point and is unable to reconnect. It is also unable to see any other networks in the area and is unable to attempt to connect to them. I am using the GNOME 3(.16) desktop environment, and where it shows the signal strength in the top-right pulldown menu, it shows a blank white square instead. Attempting to use the hardware switch to turn the wireless device on and off has no effect, nor does trying a software reset of the device.
In order to get the device in a usable state, I have to restart it.
# First Encountered
Many months ago. I cannot remember an exact date, sorry :-( It occurred on kernel versions prior to 3.18. However, for some reason, it is much more frequent now.
# Additional Information
As I could only attach one file, I decided to tar up all of the following files:
* dmesg1.txt, dmesg2.txt - A couple of dmesg output's when the problem has occurred (look to the bottom of the file).
* rfkill.txt - The results of rfkill when the problem has happened (I think this is irrelevant, as the output is exactly the same when the problem has occurred, and when it hasn't).
* modinfogood.txt, modinfobad.txt - Module info (modinfo) when there's no problem, and when there is a problem (these were taken at different reboots).
* lsmod.txt - `lsmod' output.
* lspci.txt - `lspci' output.
* lspci_long.txt - `lspci -vvknnqq' output
* lsusb.txt - `lsusb' output.
* lscpu.txt - `lscpu' output.
* cpuinfo.txt - `cat /proc/cpuinfo' output.
* meminfo.txt - `cat /proc/meminfo' output.
* ver_linux.txt - Output produced from `scripts/ver_linux'.
I am running a Clevo P150EM (known as Sager NP9150 in the United States), with an Intel Centrino Advanced-N 6235 wireless card ( http://www.intel.com/content/www/us/en/wireless-products/centrino-advanced-n-6235.html ).
# Anything Else?
If there is anything else you need to know, or need me to try, please let me know. :-) I hope I have done everything right.
Unfortunately you are hitting a bug that was closed as will not fix. See the info there.
*** This bug has been marked as a duplicate of bug 91171 ***
(In reply to Emmanuel Grumbach from comment #1)
> Unfortunately you are hitting a bug that was closed as will not fix. See the
> info there.
> *** This bug has been marked as a duplicate of bug 91171 ***
I fail to see how it is? My laptop is stationary the entire time, I do not move it from place to place and the problem occurs. If I do move the laptop to another location, the connection does not drop as it did for those in bug #91171.
Furthermore, I experience no such issue on Windows 7, only Linux. Not to mention I have had this computer for 2 years, with Linux running on it off and on, and only recently (couple of months) have I experienced this issue. I experience no issue with other hardware, only wireless.
Are you sure it's the same bug?
Can you please attach the output of sudo lspci -xxxx -vvvv before and after the failure?
You are not the first one complaining about this issue happening more recently. Someone even tried to bisect with no luck.
Did you update your BIOS?
I am pretty sure it is the same bug since the driver can't access the device in both issues.
Created attachment 177091 [details]
Output of `lspci -xxxx -vvvv' before the problem.
Here is the output of the command `lspci -xxxx -vvvv' before the problem.
I'll have to wait until the problem occurs again before I am able to grab the output of that command. This could be anywhere between a few minutes, to a few hours, to a few days.
(In reply to Emmanuel Grumbach from comment #3)
> Did you update your BIOS?
I have not updated my BIOS since I first got the computer a couple of years ago.
please run lspci with root permissions. I mentioned you need sudo.
Created attachment 177101 [details]
Output of `sudo lspci -xxxx -vvvv' before the problem.
Hi, sorry for the late response, I ran into a couple of issues when piping the output of lspci to a file, namely: `pcilib: sysfs_read_vpd: read failed: Connection timed out'. dmesg had `rtsx_pci 0000:03:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update' appended to the end of it each time this happened.
Anyways, I ran it under root permissions as you said (sorry about that the first time), but it appears to be mostly the same as before?
Can you do the same with Windows?
you can use read write anything. This application will dump the config space of the device.
I'd like to compare them.
But I am very pessimistic.
Created attachment 177351 [details]
Output of `sudo lspci -xxxx -vvvv' after the problem.
Here's the output of `sudo lspci -xxxx -vvvv' after the problem had occurred. Sorry, it's been a little while I've been extremely busy.
If you'd still like, I should be able to provide you with the output on Windows on the weekend?
yeah you can - but again. I can't say anything regarding the likelihood that we will be able to do something with it.
That's understandable. If the problem is still occurring in a couple of months, I will try and give bisection a go.
I just noticed this webpage: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging
If it would help, would you like me to provide you with a trace or a firmware dump?
No - the problem is really a bus problem. BTW - can you attach a dmesg + lspci *after* the problem occurs. I'd like to correlate them. I am afraid you might be seeing several issues.
Created attachment 177381 [details]
Tar'd file containing the stdout of `lspci' and `dmesg'.
For some reason, the problem was really really bad today. As a result, I've managed to snatch up many log files. Each dmesg text file corresponds to the lspci text file of the same number.
For both lspci and dmesg logs number 11, I decided to try the supposed workaround mentioned in #91171:
echo 1 > /sys/bus/devices/0000\:00\:03.0/remove
echo 1 > /sys/bus/pci/rescan
That however, did not work.
I have also uploaded a video onto YouTube of what happens immediately after the problem has occurred (in this video, I just logged on and left it for a couple of minutes, after which the problem had happened and then the video happens):
Created attachment 177451 [details]
Tar'd file containing the stdout of `lspci' and `dmesg'.
It happened again. Decided to try a much older version.
This is using archlinux-2014.05.01.iso (kernel 3.14). It printed out a lot of text, different from those previous from a newer kernel.
This does not bode well :-(
I am not surprised.
re-closing as duplicate.