Bug 91171

Summary: iwlwifi: Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff) - MWG100226667
Product: Drivers Reporter: Sait (sait.a.umar)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED WILL_NOT_FIX    
Severity: normal CC: crayzeewulf, gredner, ilw, jernej.azarija, johannes.hirte, kernel, kichawa23, kjslag, lvlrdka22, mayazcherquoi, me, omazeas, piccinini.santiago, rehan.malak, scott, smithbone, sqweek, v.shevlyakov, vasyl.demin, vi0oss
Priority: P1    
Hardware: All   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=95811
Kernel Version: 3.18.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output
dmesg_bad output
/var/log/messages
lspci.log
New lspsci after setpci commands
dmesg with aspm off
dmesg with aspm off after network loss
Take 2
Take 2
lspci with ASPM off
dmesg after rescan attempt
dmesg after rescan
dmesg without removing device before suspend
dmesg with removing device before suspend
dmesg without removing device before suspend, patch applied
dmesg log of first iwlwifi crash on LENOVO 3626C13
dmesg-4.8.13
lspci_lenovo_x220

Description Sait 2015-01-11 20:41:02 UTC
As the kernel/firmware improved the usual connection drops have stopped
happening EXCEPT when I walk from kitchen to the living room with the
laptop. They are about 25 feet apart and both locations have very strong signal
and if one is stationary in both locations the connections stays on indefinitely.

This happens every time no exception. Each time I get a message in the system log:

[27334.154521] iwlwifi 0000:03:00.0: Failed to wake NIC for hcmd
[27334.154573] iwlwifi 0000:03:00.0: Error sending SCAN_OFFLOAD_REQUEST_CMD: enqueue_hcmd failed: -5
[27334.154589] iwlwifi 0000:03:00.0: Scan failed! ret -5
iwlwifi 0000:03:00.0: Failed to wake NIC for hcmd
iwlwifi 0000:03:00.0: Error sending SCAN_OFFLOAD_REQUEST_CMD: enqueue_hcmd failed: -5
iwlwifi 0000:03:00.0: Scan failed! ret -5

The laptop was not in sleep mode or powersave mode. I have
iwconfig wlp3s0 power off
in my rc.local and it is disabled. The problem is independent of this though.
Dmesg gives:
Detected Intel(R) Dual Band Wireless N 7260, REV=0x144

I am using kernel 3.17.8 from Fedora 21 but with the firmware 23.11.10.0:
iwlwifi 0000:03:00.0: loaded firmware version 23.11.10.0 op_mode iwlmvm

As I said connection is perfect except when you pick up the laptop and walk
to the next room.
Thanks
Comment 1 Emmanuel Grumbach 2015-01-12 06:22:00 UTC
Hi,

can you please share your full dmesg output?
The message you pasted means that the WiFi NIC is not responding to the driver commands. In order to debug that, I'll need to compile a special version of the firmware and driver so that you can collect the relevant firmware logs.
Comment 2 Sait 2015-01-12 22:00:02 UTC
Created attachment 163331 [details]
dmesg output
Comment 3 Emmanuel Grumbach 2015-01-13 04:31:14 UTC
I can't see any problem here..
Comment 4 Sait 2015-01-13 15:04:32 UTC
Ok. I did not realize you wanted one after the problem happens. Will this show
anything? I will try to do this when I get home.
Comment 5 Sait 2015-01-14 00:39:29 UTC
Hi, I think there is some confusion here. dmesg output after the crash is the
same as before. The problem does not happen during the boot. Only when the
laptop is moved from one place to another place. Am I missing something?
Comment 6 Emmanuel Grumbach 2015-01-14 03:22:31 UTC
Yes. You are missing the fact that dmesg outputs the latest kernel logs, not the boot logs.
Comment 7 Sait 2015-01-14 22:21:34 UTC
OK. I was wrong regarding the kernel message in the initial post.That seems to
be unrelated.

I have started the computer in the kitchen....network working...save dmesg_good.out

I walked with the laptop to the next room...network stopped working.. dmesg_bad.out

There is no difference between the two dmesg outputs. However, the network is not
working while the NetworkManager thinks that it is.

I did update to kernel 3.18.2, which seems to make no difference.
Comment 8 Emmanuel Grumbach 2015-01-14 22:24:28 UTC
please attach your dmesg

in your original report you paste the "failed to wake..."

where did you paste that from?
Comment 9 Sait 2015-01-14 22:36:06 UTC
Created attachment 163511 [details]
dmesg_bad output
Comment 10 Sait 2015-01-14 22:36:56 UTC
I got these from /var/log/messages. They seem to come after the computer is on for
a while. The test above of starting and moving the computer right after that 
does not seem to write those messages even though the network connection stops.
I am attaching the latest dmesg after stop.

Looking into /var/log/messages I also saw these repeated many times but a few
days ago, nothing to do with the current stoppage:
====================
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 0
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: Current SW read_ptr 150 write_ptr 151
Jan 12 18:02:16 Thinkpad kernel: ------------[ cut here ]------------
Jan 12 18:02:16 Thinkpad kernel: WARNING: CPU: 2 PID: 678 at drivers/net/wireless/iwlwifi/pcie/trans.c:1266 iwl_trans_pcie_grab_nic_access+0xf9/0x110 [iwlwifi]()
Jan 12 18:02:16 Thinkpad kernel: Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
Jan 12 18:02:16 Thinkpad kernel: Modules linked in: fuse ccm bnep vfat fat arc4 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi uvcvideo intel_rapl videobuf2_vmalloc x86_pkg_temp_thermal iTCO_wdt coretemp iwlmvm iTCO_vendor_support mac80211 videobuf2_memops videobuf2_core kvm v4l2_common btusb iwlwifi videodev bluetooth crct10dif_pclmul crc32_pclmul snd_hda_intel cfg80211 media snd_hda_controller crc32c_intel snd_hda_codec ghash_clmulni_intel joydev snd_hwdep serio_raw snd_seq rtsx_pci_ms snd_seq_device memstick snd_pcm thinkpad_acpi e1000e wmi rfkill tpm_tis mei_me i2c_i801 tpm mei snd_timer snd shpchp ptp pps_core lpc_ich soundcore i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper drm rtsx_pci mfd_core video
Jan 12 18:02:16 Thinkpad kernel: CPU: 2 PID: 678 Comm: wpa_supplicant Not tainted 3.17.8-300.fc21.x86_64 #1
Jan 12 18:02:16 Thinkpad kernel: Hardware name: LENOVO 20AQCTO1WW/20AQCTO1WW, BIOS GJET80WW (2.30 ) 10/20/2014
Jan 12 18:02:16 Thinkpad kernel: 0000000000000000 000000000b05f82b ffff880206b0b6e0 ffffffff81741aea
Jan 12 18:02:16 Thinkpad kernel: ffff880206b0b728 ffff880206b0b718 ffffffff810970bd ffff880211544000
Jan 12 18:02:16 Thinkpad kernel: 0000000000000000 ffff880211547bb8 ffff880206b0b7c8 0000000000000000
Jan 12 18:02:16 Thinkpad kernel: Call Trace:
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81741aea>] dump_stack+0x45/0x56
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff810970bd>] warn_slowpath_common+0x7d/0xa0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff8109713c>] warn_slowpath_fmt+0x5c/0x80
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa0516f39>] iwl_trans_pcie_grab_nic_access+0xf9/0x110 [iwlwifi]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa05156ce>] iwl_trans_pcie_read_mem+0x3e/0xd0 [iwlwifi]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa0515af2>] iwl_trans_pcie_wait_txq_empty+0x232/0x4c0 [iwlwifi]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81744719>] ? schedule_preempt_disabled+0x29/0x70
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff8174688b>] ? __mutex_lock_slowpath+0x14b/0x210
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa071fbaf>] iwl_mvm_mac_flush+0xbf/0x140 [iwlmvm]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa0557e61>] ieee80211_flush_queues+0xa1/0x180 [mac80211]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa05711fa>] ieee80211_set_disassoc+0x2ca/0x390 [mac80211]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa0575636>] ieee80211_mgd_deauth+0xf6/0x280 [mac80211]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff813150db>] ? cred_has_capability+0x6b/0x130
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa05447a8>] ieee80211_deauth+0x18/0x20 [mac80211]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa039016e>] cfg80211_mlme_deauth+0x9e/0x150 [cfg80211]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffffa037712e>] nl80211_deauthenticate+0xde/0x120 [cfg80211]
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81665b8c>] genl_family_rcv_msg+0x1bc/0x3e0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81665db0>] ? genl_family_rcv_msg+0x3e0/0x3e0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81665e29>] genl_rcv_msg+0x79/0xc0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81665439>] netlink_rcv_skb+0xa9/0xd0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff816659b8>] genl_rcv+0x28/0x40
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81664a4a>] netlink_unicast+0x12a/0x1a0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81664e11>] netlink_sendmsg+0x351/0x7c0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81311f22>] ? sock_has_perm+0x72/0x90
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81221e02>] ? poll_select_copy_remaining+0xd2/0x150
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff816172ce>] sock_sendmsg+0x9e/0xe0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81616f9e>] ? move_addr_to_kernel.part.20+0x1e/0x70
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81617ea1>] ? move_addr_to_kernel+0x21/0x30
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81617738>] ___sys_sendmsg+0x3c8/0x3e0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff810a27c7>] ? recalc_sigpending+0x17/0x60
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff810a3311>] ? __set_task_blocked+0x41/0xa0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff810a60a6>] ? __set_current_blocked+0x36/0x60
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff810a6164>] ? signal_setup_done+0x74/0xc0
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff810209c2>] ? __restore_xstate_sig+0xa2/0x660
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81013758>] ? do_signal+0x1b8/0x800
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81618651>] __sys_sendmsg+0x51/0x90
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff816186a2>] SyS_sendmsg+0x12/0x20
Jan 12 18:02:16 Thinkpad kernel: [<ffffffff81748ca9>] system_call_fastpath+0x16/0x1b
Jan 12 18:02:16 Thinkpad kernel: ---[ end trace 93a6d40638fa6cad ]---
Jan 12 18:02:16 Thinkpad kernel: iwl data: 00000000: b8 da 5c 08 02 88 ff ff b8 da 5c 08 02 88 ff ff  ..\.......\.....
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: FH TRBs(0) = 0x5a5a5a5a
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: FH TRBs(1) = 0x5a5a5a5a
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: FH TRBs(2) = 0x5a5a5a5a
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: FH TRBs(3) = 0x5a5a5a5a
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: FH TRBs(4) = 0x5a5a5a5a
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: FH TRBs(5) = 0x5a5a5a5a
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: FH TRBs(6) = 0x5a5a5a5a
Jan 12 18:02:16 Thinkpad kernel: iwlwifi 0000:03:00.0: FH TRBs(7) = 0x5a5a5a5a
Jan 12 18:02:17 Thinkpad kernel: ------------[ cut here ]------------
Comment 11 Sait 2015-01-15 00:17:14 UTC
OK...the NIC message comes if I leave the laptop on in the kitchen for a 
while and then come and pick it up and go to the living room.
==================================
Jan 14 18:08:22 Thinkpad kernel: [ 7961.370904] thinkpad_acpi: EC reports that Thermal Table has changed
Jan 14 18:08:22 Thinkpad kernel: thinkpad_acpi: EC reports that Thermal Table has changed
Jan 14 18:09:08 Thinkpad kernel: [ 8008.364302] iwlwifi 0000:03:00.0: Failed to wake NIC for hcmd
Jan 14 18:09:08 Thinkpad kernel: [ 8008.364336] iwlwifi 0000:03:00.0: Error sending SCAN_OFFLOAD_REQUEST_CMD: enqueue_hcmd failed: -5
Jan 14 18:09:08 Thinkpad kernel: [ 8008.364341] iwlwifi 0000:03:00.0: Scan failed! ret -5
Jan 14 18:09:08 Thinkpad kernel: iwlwifi 0000:03:00.0: Failed to wake NIC for hcmd
Jan 14 18:09:08 Thinkpad kernel: iwlwifi 0000:03:00.0: Error sending SCAN_OFFLOAD_REQUEST_CMD: enqueue_hcmd failed: -5
Jan 14 18:09:08 Thinkpad kernel: iwlwifi 0000:03:00.0: Scan failed! ret -5
Jan 14 18:09:11 Thinkpad dbus[679]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
Jan 14 18:09:11 Thinkpad dbus[679]: [system] Successfully activated service 'net.reactivated.Fprint'
Jan 14 18:09:11 Thinkpad fprintd: Launching FprintObject
Jan 14 18:09:11 Thinkpad fprintd: ** Message: D-Bus service launched with name: net.reactivated.Fprint
Jan 14 18:09:11 Thinkpad fprintd: ** Message: entering main loop
Comment 12 Emmanuel Grumbach 2015-01-15 08:28:26 UTC
Sait - please stop pasting parts of file.
You are provided small pieces of information, it is absolutely impossible to understand what is happening this way.

Please attach full files to the bug so that I have the full picture.

thanks.
Comment 13 Sait 2015-01-16 01:12:50 UTC
Created attachment 163571 [details]
/var/log/messages

This is the full /var/log/messages from first boot, computer operates for a while,
carried to the next room (network stops).

By the way the orginal "Failed to wake NIC for hcmd" does appear if I try to
access the network (start firefox). Otherwise the network is down but the
messages is not written.
Comment 14 Emmanuel Grumbach 2015-01-16 09:17:41 UTC
Please do the following:

sudo lspci -xxxx -vvvv > lspci.log

And attach lspci.log to the bug

Thanks.
Comment 15 Sait 2015-01-16 19:50:31 UTC
Created attachment 163661 [details]
lspci.log

Attached lspci.log
Comment 16 Sait 2015-01-17 20:18:57 UTC
Hi, I just installed kernel 3.18.3 and the problem is gone. I have tested it
a few times and it never dropped connection. I see some patches in 3.18.3.
If it happens again in the next few days I will report it here.
Thanks,
Comment 17 Emmanuel Grumbach 2015-01-17 20:24:40 UTC
Ok- this is weird since I don't how the patches that are in 3.18.3 could possibly solve this, but if the problem is gone, I am not going to complain.
Comment 18 Sait 2015-01-17 20:30:56 UTC
Originally my COMCAST router signal was too weak in the kitchen and with older
firmware the connection would drop (without moving the computer). Then I bought
a Netgear repeater and put it into the living room which made the signal strong
everywhere. I think it was after this that the connection started to drop only
after moving the computer.
Comment 19 Sait 2015-01-17 21:26:14 UTC
Oops..it happened again. The computer was in the kitchen for about 20min, I came
back and carried it to the next room, connection dropped with the same message.
However, it is not always reproducible any more. I went back and forth 4 times
and it never dropped the connection. Earlier, I did leave it standing for a while
too and it still worked. So, something seems to have improved but not completely.
Comment 20 Emmanuel Grumbach 2015-01-18 10:43:06 UTC
can you try to do this?

setpci -s 03:00.0 0x160.B=0x00
setpci -s 00:1c.1 0x204.B=0x10


Note - these settings will go away after reboot, you'll need to do this after each reboot.
Let me know if it helps. Thanks
Comment 21 Sait 2015-01-18 20:42:49 UTC
I executed those commands and after a while it happened again.
Comment 22 Emmanuel Grumbach 2015-01-18 20:59:18 UTC
please attach again the output of sudo lspci -vvvv -xxxx after you executed the commands.

Thanks.
Comment 23 Sait 2015-01-18 21:22:38 UTC
Created attachment 163731 [details]
New lspsci after setpci commands
Comment 24 Emmanuel Grumbach 2015-01-19 06:06:44 UTC
ok- please try to boot with pcie_aspm=off
Comment 25 Sait 2015-01-20 00:26:04 UTC
Created attachment 163851 [details]
dmesg with aspm off

I did that (after removing the setpci commands) and same thing happened. I am
attaching dmesg before and after.
Comment 26 Sait 2015-01-20 00:26:58 UTC
Created attachment 163861 [details]
dmesg with aspm off after network loss
Comment 27 Emmanuel Grumbach 2015-01-20 04:34:23 UTC
Nope. The setting didn't work. You didn't enter the kernel boot parameter properly.
Comment 28 Sait 2015-01-20 14:16:32 UTC
Hmmmm....I put it on the kernel line in grub2.cfg....I will check it again when
I get home. How about this line in dmesg:

ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
Comment 29 Sait 2015-01-21 00:32:54 UTC
Created attachment 163971 [details]
Take 2

Ok...this laptop uses grub-efi so it has another configuration file confused me
because they both get updated when I update the kernel. Anyway, I am attaching
the new logs.
Comment 30 Sait 2015-01-21 00:33:20 UTC
Created attachment 163981 [details]
Take 2
Comment 31 Emmanuel Grumbach 2015-01-21 22:03:36 UTC
This is really weird - the kernel claims ASPM is disabled but iwlwifi sees it enabled...

Can you please send the output of sudo lspci -vvvv -xxxx in this configuraiton?

This looks like a PCI bus issue.
Comment 32 Sait 2015-01-21 22:55:15 UTC
Created attachment 164251 [details]
lspci with ASPM off
Comment 33 Emmanuel Grumbach 2015-01-29 09:34:26 UTC
This is really weird - I am involving the PCI maintainers.
Comment 34 Sait 2015-01-29 14:21:48 UTC
Thanks...I saw many reports about ASPM and Lenovo. Apparently for unspecified
reasons they turn off ASPM from BIOS but don't have an option to turn it on.
Comment 35 Emmanuel Grumbach 2015-02-24 13:31:23 UTC
*** Bug 92231 has been marked as a duplicate of this bug. ***
Comment 36 Emmanuel Grumbach 2015-02-24 13:31:41 UTC
*** Bug 91651 has been marked as a duplicate of this bug. ***
Comment 37 Emmanuel Grumbach 2015-02-24 13:32:10 UTC
*** Bug 92381 has been marked as a duplicate of this bug. ***
Comment 38 Emmanuel Grumbach 2015-02-24 13:35:59 UTC
Does someone here have Windows on the same system?
Does Windows suffer from the same issue?
Comment 39 Emmanuel Grumbach 2015-03-15 15:25:56 UTC
Would it be possible to get the required data?
I will close the bug in 2 days unless I get the required data. Of course, a closed bug can be re-opened.
Comment 40 Sait 2015-03-15 16:36:16 UTC
I am not sure what data you are referring to. Perhaps someone from the
duplicate bugs? I do not have windows on the same system.
Comment 41 Emmanuel Grumbach 2015-03-16 06:23:01 UTC
Yes, but I can't do much without this data unfortunately.
Comment 42 Emmanuel Grumbach 2015-03-19 11:49:08 UTC
can you tell me if this allows you get WiFi back (as root):

echo 1 > /sys/bus/devices/0000\:00\:03.0/remove
echo 1 > /sys/bus/pci/rescan
killall wpa_supplicant
Comment 43 Sait 2015-03-19 23:35:40 UTC
First line needed a /pci/ in it.

No, unfortunately the connection went down but it never came up. In 
/var/log/messages I got these messages every time I tried:

Mar 19 18:26:59 Thinkpad kernel: iwlwifi 0000:03:00.0: Failed to wake NIC for hcmd
Mar 19 18:26:59 Thinkpad kernel: iwlwifi 0000:03:00.0: Error sending SCAN_OFFLOAD_REQUEST_CMD: enqueue_hcmd failed: -5
Mar 19 18:26:59 Thinkpad kernel: iwlwifi 0000:03:00.0: Scan failed! ret -5
Comment 44 Emmanuel Grumbach 2015-03-20 07:07:00 UTC
can you wait a bit between the commands?
Also - please attach the full dmesg after you run these.

I was hoping they could allow you to recover from the stalls.
Comment 45 Sait 2015-03-21 01:17:18 UTC
I waited about a minute between commands to no avail. dmesg shows a bunch
of errors (kernel 3.19.2 with the latest firmware). Thanks
Comment 46 Sait 2015-03-21 01:18:42 UTC
Created attachment 171531 [details]
dmesg after rescan attempt
Comment 47 Emmanuel Grumbach 2015-03-21 18:03:51 UTC
I am sorry, but I can't correlate your log and the execution of the commands.
It seems like you haven't run the commands from the log.
Comment 48 Emmanuel Grumbach 2015-03-21 18:04:25 UTC
Is there someone else following this bug who could help with trying these commands?
Comment 49 Emmanuel Grumbach 2015-03-21 18:05:20 UTC
(In reply to Emmanuel Grumbach from comment #42)
> can you tell me if this allows you get WiFi back (as root):
> 
> echo 1 > /sys/bus/devices/0000\:00\:03.0/remove
> echo 1 > /sys/bus/pci/rescan
> killall wpa_supplicant

Note that the 03.0 above depends on your system it may be another number.
Comment 50 Sait 2015-03-21 18:36:52 UTC
Sorry for not checking the pci. The correct address was:
echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/remove

Executing these commands from a working connection did exactly what
they are supposed to. The connection was removed, rescan brought back
the device, killall wpa_suplicant was not really necessary but executing
took connection down and brought it back up.

I will try to execute these after the system looses connection asap.
Comment 51 Reece Hart 2015-03-21 18:48:20 UTC
(In reply to Emmanuel Grumbach from comment #48)
> Is there someone else following this bug who could help with trying these
> commands?

Yep. I'm following. Thanks for the potential workaround. I haven't had the bug appear yet to give it a shot. I also see that the workaround appears confirmed by Salt in #50. Thanks for your perseverance, Emmanuel.
Comment 52 Sait 2015-03-21 20:55:27 UTC
Just tested after connection loss and it worked. The only difference was that
the remove line took about 15 seconds to complete. I will attach the new
dmesg file. Thanks.
Comment 53 Sait 2015-03-21 20:56:06 UTC
Created attachment 171571 [details]
dmesg after rescan
Comment 54 Jianlong Liu 2015-03-22 06:27:20 UTC
I think the bug I'm experiencing is the same one, please let me know if not and I'll open a separate report.

Mine happens if I just put my Dell Venue 11 Pro 7130 into suspend. Upon waking, the WiFi PCIe card always stops getting detected. Unlike previous versions, the device doesn't even appear in lspci or /sys/bus/pci/devices.
The card is (according to lspci) Intel Corporation Wireless 7260 (rev 83).

Since the device doesn't get listed in PCI devices after waking up, running "echo 1 > /sys/bus/devices/0000\:01\:00.0/remove" doesn't make sense. Rescanning doesn't make it appear either.
However, if I run "echo 1 > /sys/bus/devices/0000\:01\:00.0/remove" prior to suspend, it gets detected fine upon waking.
Comment 55 Jianlong Liu 2015-03-22 06:31:56 UTC
Created attachment 171611 [details]
dmesg without removing device before suspend

Hardware: Dell Venue 11 Pro, Intel 7260.
dmesg (starting with the initial wake log, for comparison) with calltrace when I don't run "echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove" prior to suspend.
The WiFi card doesn't appear in /sys/bus/pci/devices.
Comment 56 Jianlong Liu 2015-03-22 06:32:43 UTC
Created attachment 171621 [details]
dmesg with removing device before suspend

Hardware: Dell Venue 11 Pro, Intel 7260.
dmesg (starting with the initial wake log, for comparison) when I run "echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove" prior to suspend.
The WiFi card appears in /sys/bus/pci/devices and works fine.
Comment 57 Emmanuel Grumbach 2015-03-22 06:55:32 UTC
Thanks Jianlong for sharing this. I have to say this is the first time I hear about such a weird thing.

Can you and Sait try this:

diff --git a/drivers/net/wireless/iwlwifi/pcie/tx.c b/drivers/net/wireless/iwlwifi/pcie/tx.c
index 6817890..15591b8 100644
--- a/drivers/net/wireless/iwlwifi/pcie/tx.c
+++ b/drivers/net/wireless/iwlwifi/pcie/tx.c
@@ -697,9 +697,9 @@ void iwl_pcie_tx_start(struct iwl_trans *trans, u32 scd_base_addr)
                           reg_val | FH_TX_CHICKEN_BITS_SCD_AUTO_RETRY_EN);

        /* Enable L1-Active */
-       if (trans->cfg->device_family != IWL_DEVICE_FAMILY_8000)
-               iwl_clear_bits_prph(trans, APMG_PCIDEV_STT_REG,
-                                   APMG_PCIDEV_STT_VAL_L1_ACT_DIS);
+//     if (trans->cfg->device_family != IWL_DEVICE_FAMILY_8000)
+//             iwl_clear_bits_prph(trans, APMG_PCIDEV_STT_REG,
+//                                 APMG_PCIDEV_STT_VAL_L1_ACT_DIS);
 }

 void iwl_trans_pcie_tx_reset(struct iwl_trans *trans)
Comment 58 Sait 2015-03-22 21:17:12 UTC
I recompiled the Fedora kernel rpm 3.19.2 with the above patch but unfortunately
I experienced the same disconnect. Running the commands to remove and pci
fixed the problem.
Comment 59 Sait 2015-03-22 23:23:48 UTC
Further comment...the disconnect does seem to happen less often...so far
1 out of 3 tests.
Comment 60 Jianlong Liu 2015-03-23 00:02:53 UTC
It didn't work for me either. The calltrace is very slightly different though.
Comment 61 Jianlong Liu 2015-03-23 00:06:09 UTC
Created attachment 171681 [details]
dmesg without removing device before suspend, patch applied

Hardware: Dell Venue 11 Pro, Intel 7260.
Patch applied to kernel 4.0 latest upstream.
Comment 62 Jianlong Liu 2015-03-23 00:07:42 UTC
I suppose I should also note, the device still disappears after resuming.
Comment 63 Emmanuel Grumbach 2015-03-23 19:22:44 UTC
thanks for the testing.
Comment 64 Emmanuel Grumbach 2015-03-29 19:59:21 UTC
*** Bug 95811 has been marked as a duplicate of this bug. ***
Comment 65 Emmanuel Grumbach 2015-04-11 19:44:02 UTC
*** Bug 96391 has been marked as a duplicate of this bug. ***
Comment 66 Cyril 2015-04-15 18:34:39 UTC
Ok, don't know if it's related or useless fact but anyway : when i boot, sometime, it looks like i don't have any wifi card. Nothing with lspci and i got this message at login screen: USB 4-1.5: device not accepting address 3: error -32.

USB 4-1.5 seems to be connected to my wifi card because it appears in dmesg when i use the rfkill switch.

When it happens, i have to reboot several times to get the wifi back again.
Comment 67 Emmanuel Grumbach 2015-04-15 19:22:21 UTC
I have asked quite a few people who are dealing with such issues.
These issues are platform issues. The driver can't do much.
For completeness I have to say that once I found a workaround for a hardware bug that I added in the driver and solved issues of this kind. But I don't think there are any such workarounds that I am missing right now.

So - unfortunately, there isn't much.

@Cyril, in your case, it is even more obvious that the driver isn't involved, because the NIC doesn't even enumerate. This is clearly a faulty system.
Comment 68 Emmanuel Grumbach 2015-04-22 08:12:35 UTC
*** Bug 97081 has been marked as a duplicate of this bug. ***
Comment 69 omazeas 2015-04-22 08:22:27 UTC
So, what you are saying is that nothing can be done through the driver. Is it due to my Linux system or is it due to the network controller hardware to be faulty?

The best workaround seems to be:
> echo 1 > /sys/bus/pci/devices/0000\:00\:03.0/remove
> echo 1 > /sys/bus/pci/rescan
> killall wpa_supplicant
right?

The best way to completely fix this would be to add a usb network controller as a replacement?
Comment 70 Emmanuel Grumbach 2015-04-22 08:27:08 UTC
(In reply to omazeas from comment #69)
> So, what you are saying is that nothing can be done through the driver. Is
> it due to my Linux system or is it due to the network controller hardware to
> be faulty?
> 

Probably your system being faulty. You can try to upgrade the BIOS or something like that.

> The best workaround seems to be:
> > echo 1 > /sys/bus/pci/devices/0000\:00\:03.0/remove
> > echo 1 > /sys/bus/pci/rescan
> > killall wpa_supplicant
> right?
> 
> The best way to completely fix this would be to add a usb network controller
> as a replacement?

you can try.
Comment 71 omazeas 2015-04-22 08:44:27 UTC
Thanks Emmanuel. Your help is highly appreciated!
I have already upgraded the BIOS a few months ago. I was very excited at the time because I did not get any disconnection for a few days after that but it started again.
I might want to try the latest version of the kernel too with the last Ubuntu upgrade although I was reluctant to do that because recently released ones are not long-term supported...
Comment 72 Emmanuel Grumbach 2015-04-22 08:47:17 UTC
I doubt it will help to upgrade the kernel.
Comment 74 Emmanuel Grumbach 2015-04-22 08:51:15 UTC
no
Comment 75 omazeas 2015-04-22 09:31:26 UTC
All right, thanks so much for your expertise.
Comment 76 Sait 2015-04-22 12:51:24 UTC
Did we establish that the same drops occur under Windows on the same machines?
I am still a bit puzzled because my drops don't happen when the laptop is
stationary. Only after leaving it sitting in one place for a while and then
picking it up and moving to another location (excellent signal strength during
the entire path). I created a little shell script with the restart commands in it
and it works fine to restart wifi.
Comment 77 omazeas 2015-04-22 15:39:21 UTC
I never really browse the internet the few times I use Windows... My drops can happen both when stationary and while moving but more likely when moving.
I am interested in the script if you can make it available, although I am not quite confident the restart commands are efficient for me yet...
Comment 78 Jernej 2015-04-25 18:42:58 UTC
I am experiencing a similar type of problem that occurs quite frequently. In the most recent version of the kernel shipped by Ubuntu the error looks like

========
[12023.651643] ieee80211 phy0: Hardware restart was requested
[12023.651648] iwlwifi 0000:03:00.0: Adding station 00:21:96:55:90:10 failed.
[12023.651654] iwlwifi 0000:03:00.0: Unable to add station 00:21:96:55:90:10 (-5)
[12023.651692] wlan0: failed to insert STA entry for the AP (error -5)
[12023.651767] iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
[12023.658736] iwlwifi 0000:03:00.0: Radio type=0x2-0x1-0x0
[12023.932984] iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
[12023.940005] iwlwifi 0000:03:00.0: Radio type=0x2-0x1-0x0

========

Full dmesg log is here: https://www.dropbox.com/s/es840ccgkfhn1m9/dmesg.log?dl=0

Can someone confirm it is the same bug? What can I do to help?
Comment 79 Emmanuel Grumbach 2015-04-25 19:22:08 UTC
@Jermej:

your issue seems slightly different, but it might be related. In your case the firmware doesn't reply to host command which is really strange and typically points to a bug / firmware issue. In both cases, there is unfortunately not much I can do.
Comment 80 Richard Smith 2015-05-11 14:15:02 UTC
I appear to be experiencing the same timeout problem.  ie..

 Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)

I have an Advanced-N 6200 in a Lenovo x201 laptop.  

In my case it was a gradual decline.  Wifi issues have been periodically happening for a few months but a power cycle would fix it. Over the last week it started occurring much more and as of this weekend its daily. 

Once it happens if I reset the wireless with with my hardware switch then it fails to load in the firmware.    

An earlier post asked for checking if it happened in windows.  My machine still has (mostly unused) Windows 7 home that came with it originally.  I booted that and appeared to get similar behavior.  It works and then dies.

If I do the workaround:

echo 1 > /sys/bus/pci/devices/<mydevice>/remove
echo 1 > /sys/bus/pci/rescan
killall wpa_supplicant

It seems to come back.  I only started looking for solutions this morning so I don't have a good idea yet how log it will last after banging it on the head with the workaround.

I've noticed a few other quirks lately too like my battery status indicator not behaving correctly, ACPI power off or reboot commands not working, and sometimes when the laptop is suspended it will power off.

I'm running an Ubuntu 12.04.5 LTS system with trusty kernel backport.

~$ uname -a
Linux thinko 3.13.0-52-generic #86~precise1-Ubuntu SMP Tue May 5 18:08:42 UTC 2015 i686 i686 i386 GNU/Linux

I suspect I'm experiencing some sort of embedded controller badness or some other hardware that is about to fail.

If anyone thinks its worth it to attempt to dig deeper I can certainly run more tests or try out newer kernels.
Comment 81 Emmanuel Grumbach 2015-05-13 19:56:08 UTC
This seems to say that the problem is in the BIOS / platform.

Have you changed something in your BIOS to make it happen more often.

In any case, I am afraid I have no other choice than closing this bug. Clearly, it is not going anywhere. Having it opened makes no sense.
I'll leave Intel CCed to this bug so that if someone has something clever to say, we'll still hear him out.
Comment 82 Richard Smith 2015-05-13 20:58:03 UTC
No BIOS changes.  Its the same version that it shipped with. 

There does appear to be a later BIOS for my laptop.  I might try that and see if it changes anything.  If it does then I'll report back.
Comment 83 Emmanuel Grumbach 2015-05-17 06:34:42 UTC
*** Bug 98411 has been marked as a duplicate of this bug. ***
Comment 84 kichawa 2015-06-25 08:57:37 UTC
Any update for this bug? This problem is very annoying.

My BIOS is up-to-date (http://support.lenovo.com/us/en/downloads/ds013909) and it's still the same problem with iwlwifi.
Comment 85 Emmanuel Grumbach 2015-06-25 08:59:11 UTC
Unless you are willing to ship the platform to us, there isn't much we can do.
Comment 86 Jernej 2015-06-25 09:02:57 UTC
Not sure if that's relevant or helpful but it turned out that in my case this issue was actually related to a faulty wireless router.
Comment 87 Emmanuel Grumbach 2015-06-25 09:04:03 UTC
(In reply to Jernej from comment #86)
> Not sure if that's relevant or helpful but it turned out that in my case
> this issue was actually related to a faulty wireless router.

That can't be the same issue then :)
Comment 88 crayzeewulf 2015-07-24 21:54:01 UTC
> 
> If I do the workaround:
> 
> echo 1 > /sys/bus/pci/devices/<mydevice>/remove
> echo 1 > /sys/bus/pci/rescan
> killall wpa_supplicant
> 
> It seems to come back.  
>

I had the same problem on a Samsung Chronos Series 7 (700Z3C/700Z5C) laptop with BIOS version P00AAS. The above workaround appears to be working on this machine too.
Comment 89 kichawa 2015-08-14 11:09:48 UTC
I am convinced that this is a hardware problem. I replaced the wifi card and problem is gone.
Comment 90 Richard Smith 2015-08-14 13:27:42 UTC
What model lenovo do you you have?  Did you replace it with the same type of wifi device?

I'm still having the issue. It seems to come and go.  I'd like to try a replacement.
Comment 91 Sait 2015-08-14 13:30:17 UTC
I agree. Could you tell how you replaced it. I have Lenovo 440s. Thanks
Comment 92 kichawa 2015-08-14 13:57:41 UTC
My model laptop is x201.
I replaced with the same type of wifi card - Advanced-N 6200 (rev 35).

Above all i still dont understand the source of problem.

In the mean time i tried to check my wifi card in the other laptops. But thanks to lenovo this wifi card could not be install in x61s or r61.

So I bought the used wifi card. I inserted it in my laptop and I noticed with great relief that problem was solved.

But...

...the OLD one has been tested in the Dell laptop for 2 days and everything (sic!) was perfect too.

Maybe the problem was with antenna connector or with mini-pci?!
Comment 93 Christian Kujau 2016-03-22 03:27:16 UTC
Happened here with an Intel Centrino Wireless-N 2230 (rev c4), built-in an Lenovo Thinkpad E431, running Debian/stable with Linux 4.3:

WARNING: CPU: 4 PID: 26813 at /home/zumbi/linux-4.3.5/drivers/net/wireless/iwlwifi/pcie/trans.c:1543 iwl_trans_pcie_grab_nic_access+0x105/0x110 [iwlwifi]()
Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)

And then:

iwlwifi 0000:04:00.0: Log capacity -1515870811 is bogus, limit to 512 entries
iwlwifi 0000:04:00.0: Log write index -1515870811 is bogus, limit to 512
iwlwifi 0000:04:00.0: Start IWL Event Log Dump: display last 20 entries
iwlwifi 0000:04:00.0: flush request fail
iwlwifi 0000:04:00.0: iwl_trans_wait_tx_queue_empty bad state = 0
ieee80211 phy0: Hardware restart was requested
iwlwifi 0000:04:00.0: Fw not loaded - dropping CMD: 18
wlan0: HW problem - can not stop rx aggregation for 2c:b0:5d:93:90:19 tid 0
iwlwifi 0000:04:00.0: Fw not loaded - dropping CMD: 18
wlan0: HW problem - can not stop rx aggregation for 2c:b0:5d:93:90:19 tid 1
iwlwifi 0000:04:00.0: iwl_trans_wait_tx_queue_empty bad state = 0

Full dmesg & lspci output here: http://fpaste.org/343578/86172061/
Comment 94 omazeas 2016-03-22 07:45:03 UTC
I personally solved my problem by installing the rfkill package.
Comment 95 kichawa 2016-03-22 13:28:02 UTC
I bourght a new one wifi card (the same model). Everything is ok now. But my old one works fine in the other friend's laptop.
Comment 96 Johannes Hirte 2016-07-28 19:45:33 UTC
Same problem with a ProBook 645 G2 from HP. I've tried to replace a BCM43228 with a 7265 and had to revert the exchange for now. What I've observed is, at home where I'm nearly alone on my AP, it worked without problems most of the time. The connection was running days without problem. On a public AP with dozens of customers the error happens within minutes.

Sometimes it looked like the driver tried to reset itself, like unload and reload the module. In this cases it tried to load the wrong firmware and failed. On boot, firmware verion 21.302800.0 was loaded. But when the driver tried a reset, there are complains about version 17.352738.0 couldn't be loaded, without mentioning 21.302800.0 before.
Comment 97 sqweek 2016-09-22 11:53:00 UTC
Created attachment 239411 [details]
dmesg log of first iwlwifi crash on LENOVO 3626C13

I've been seeing this regularly for awhile now on my "LENOVO 3626C13/3626C13, BIOS 6QET70WW (1.40 ) 10/11/2012". Removing the pci device via sysfs and rescanning (comment 42) brings wifi back to life so thanks for the workaround - more convenient than restarting.

I've been using this laptop for years and had no such trouble until a few months ago. Lets see if I can figure out about when it started... thankfully archlinux seems to keep kernel logs from the beginning of time around; I have almost two years worth of logs and the first occurence of wifi problems was two months ago, 2016-07-26. That was a 30 days after I upgraded to kernel version 4.6.2-1-ARCH -- before that I was running 4.5.0-1-ARCH. Ever since then it's cropped up every couple of days.

Just prior to that first kernel trace iwlwifi detected and logged a hardware error. It seems to have run into trouble trying to restart the device. I've attached the dmesg logs; interestingly it happened while the laptop was mostly idle -- I had the lid closed (which I've configured not to sleep/hibernate). I was very likely getting on a train at the time, and thus not in range of any access points that I've told wpa_supplicant about.

If there's any other info that would be useful let me know. I haven't read all the comments but I saw some suggestion that this represents a hardware fault, which seems consistent with my observation -- doesn't seem to correlate with any software/firmware change.
Comment 98 Kevin 2016-12-17 01:20:50 UTC
[SOLVED for me]

I had a similar issue pop up a couple weeks ago. For me, I think it was a hardware issue since I was able to solve it by unplugging the cables from my wifi card and then plugging them back in.

I'm sorry if this is actually off topic, but this was my best google search result, and perhaps someone else (or my future self...) may find this helpful. More details below:

For me, after using wifi for a couple hours, my wifi would stop working and network manager would no longer see my wifi card and I wouldn't be able to find it in lspci. Suspending and resuming would then result in a failed resume. The comment 42 workaround didn't work for me.

log: http://pastebin.com/hnwb4EzD
system info: http://pastebin.com/TepNu8tR

for the google bots:
Dec 02 16:46:29 K kernel: iwlwifi 0000:03:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.
Dec 02 16:46:29 K kernel: iwlwifi 0000:03:00.0: Current CMD queue read_ptr 143 write_ptr 144
Dec 02 16:46:29 K kernel: ------------[ cut here ]------------
Dec 02 16:46:29 K kernel: WARNING: CPU: 3 PID: 14198 at drivers/net/wireless/iwlwifi/pcie/trans.c:1552 iwl_trans_pcie_grab_nic_access+0xf2/0x100 [iwlwifi]()
Dec 02 16:46:29 K kernel: Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
Comment 99 Santiago Piccinini 2016-12-23 13:34:08 UTC
Hi, same issue I think. Exactly as sqweek said, failures started in my lenovo x220 after I upgraded from Linux version 4.5.2-1-ARCH to 4.6.3-1-ARCH.

The problem araises almost every time I boot the notebook, so I've been using 4.4.30-1-lts for a week and the issue doesn't show up.
Comment 100 Santiago Piccinini 2016-12-23 13:38:24 UTC
Created attachment 248431 [details]
dmesg-4.8.13
Comment 101 Santiago Piccinini 2016-12-23 13:41:29 UTC
Created attachment 248441 [details]
lspci_lenovo_x220
Comment 102 scott 2017-02-12 02:33:23 UTC
I fixed this issue on my Lenovo T430 by disabling Wake-on-LAN in the BIOS.

Thanks for the hints about this being a platform issue rather than a driver
bug. I guess the fact that the NICs aren't fully powered down on suspend when
Wake-on-LAN is enabled meant that they were in an unexpected state on resume.
Comment 103 rehan.malak 2020-02-05 11:11:56 UTC
[SOLVED for me]
(In reply to Kevin from comment #98)

wifi was disappearing randomly after a few minutes after the boot. Sometimes had to boot several times before seeing wifi. Did not find any solution for weeks...

had the exact same messages as Kevin in dmesg

applied the same solution : open the Lenovo X201, clean it completely, removed the "Intel Centrino Wireless-N 1000 [Condor Peak]" PCI chip, put it back

now my wifi works correctly.


> [SOLVED for me]
> 
> I had a similar issue pop up a couple weeks ago. For me, I think it was a
> hardware issue since I was able to solve it by unplugging the cables from my
> wifi card and then plugging them back in.
> 
> I'm sorry if this is actually off topic, but this was my best google search
> result, and perhaps someone else (or my future self...) may find this
> helpful. More details below:
> 
> For me, after using wifi for a couple hours, my wifi would stop working and
> network manager would no longer see my wifi card and I wouldn't be able to
> find it in lspci. Suspending and resuming would then result in a failed
> resume. The comment 42 workaround didn't work for me.
> 
> log: http://pastebin.com/hnwb4EzD
> system info: http://pastebin.com/TepNu8tR
> 
> for the google bots:
> Dec 02 16:46:29 K kernel: iwlwifi 0000:03:00.0: Error sending REPLY_ADD_STA:
> time out after 2000ms.
> Dec 02 16:46:29 K kernel: iwlwifi 0000:03:00.0: Current CMD queue read_ptr
> 143 write_ptr 144
> Dec 02 16:46:29 K kernel: ------------[ cut here ]------------
> Dec 02 16:46:29 K kernel: WARNING: CPU: 3 PID: 14198 at
> drivers/net/wireless/iwlwifi/pcie/trans.c:1552
> iwl_trans_pcie_grab_nic_access+0xf2/0x100 [iwlwifi]()
> Dec 02 16:46:29 K kernel: Timeout waiting for hardware access (CSR_GP_CNTRL
> 0xffffffff)
Comment 104 Vyacheslav 2021-03-31 02:29:26 UTC
> echo 1 > /sys/bus/pci/devices/<mydevice>/remove
> echo 1 > /sys/bus/pci/rescan
> killall wpa_supplicant

works for me, but really annoying since I loose connection each 15-20 minutes depends on load. but if I switch to 2.4Ghz - no issues at all